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THE COMPUTATIONAL COMPLEXITY OF CALCULATING PARTITION 
FUNCTIONS OF OPTIMAL MEDIANS WITH HAMMING DISTANCE 

ISTVAN MIKL6S*’t AND HEATHER SMITHt 


Abstract. In this paper, we show that calculating the partition function of optimal medians of binary 
strings with Hamming distance is ^^P-complete for several weight functions. The case when the weight 
function is the factorial function has application in bioinformatics. In that case, the partition function 
counts the most parsimonious evolutionary scenarios on a star tree under several models in bioinformatics. 
The results are extended to binary trees and we show that it is also ^^P-complete to calculate the most 
parsimonious evolutionary scenarios on an arbitrary binary tree under the substitution model of biological 
sequences and under the Single Cut-or-Join model for genome rearrangements. 


1. The Partition Function 

For n, m C fix a multiset of binary strings B = z/ 2 : ■ ■ ■, ^m} where Ui C {0,1}^ for each i G [m]. 

An optimal median // is a binary string also in {0,1}^ which minimizes f^) where H{ui^ /x) is the 

Hamming distance between Ui and /x. Define Ai{B) to be the set of all optimal medians for the multiset B. 
Take / to be a non-negative, real-valued function. In this paper, we examine the partition function 

Z{B,f{x)) = E n 

iG[m] 


The case when f{x) := a;! has applica tion t o phylogenetic trees and genome rearrangement. Under the 
Single Cut-or-Join (Feijao and Meidanis 201 ih model for genome rearrangement, genomes are represented 
as edge labelled directed graphs forming paths and cycles, the direction of the edges along any path and 
cycle might vary. Such a graph can be encoded in binary strings, where each possible pair of edge endpoints 
(adjacency) is represented with one bit. The bit is 1 if the adjacency is presented in the genome and 0 
otherwise. Note that although any genome can be represented with such a binary string, not all binary 
strings represent a genome since two adjacencies might be in conflict if they share a common edge endpoint, 
and thus, their bits cannot be both 1. A mutation is a bit flip: a flip from 0 to 1 represents a join, a flip 
from 1 to 0 represents a cut. Any flip from 1 to 0 is possible, however, flipping 0 to 1 is possible only if it 
does not cause a conflict. Still, it can be proved that if two binary strings /r and v represent two genomes 
Gi and G 2 under the Single C ut-or-J oin model, the fewest number of mutations to transform Gi into G 2 
is i?(/i, v) (Feijao and Meidanis l201 lb . A scenario is an ordering of the mutations necessary such that each 
intermediate string obtained from performing the cuts and joins one at a time represents a valid genome. An 
upper bound on the number of scenarios is i?(/r, i^)!. This upper bound is precisely the number of scenarios 
if there is no conflict in the presented adjacencies in genomes Gi and G 2 ; that is, there is no constraint that 
some of the adjacencies first must be cut before some other adjacencies are created with joins. 

Next fix a multiset, B, of m binary strings from {0,1}". If these strings label the leaves of a star tree Ki^m, 
the center of the star, or common ancestor, should be labeled with a median from M(B) which minimizes 
the number of mutations required, summed over all edges of the star tree. A most parsimonious seenario 
for with B labeling the leaves consists of a median p, from A4{B) and a scenario transforming /r to Vi 

for each i € [m] to label the edges of the star tree. The partition function Z{B,x\) counts the number of 
most parsimonious scenarios if there is no conflict in the presented adjacencies. In this paper, we show that 
counting the most parsimonious scenarios is computationally hard already for these special cases. 
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Other bioinformatics models also use strings and changes of characters in them, for example when the 
strings represent biological sequences (DNA, RNA or protein) and changes of characters represe nt sub stitu- 
tions. We will refer to these models as substitution models of biological sequences fFelsenstein 12003 ). The 
negative results represented in this paper also holds for these models, too. 

In Section [H we establish some basics about computational complexity classes. Sections [3] and 0] are 
devoted to computing the value of Z{B, cc!). In SectionjSl we explore the possibility of stochastic approxima¬ 
tions for Z(B,x\). Then we turn our attention to the more general Z{B,f{x)) in Section[6l When log/(a;) 
is strictly concave up or strictly concave down, we obtain some further computational complexity results 
under mild restrictions. Section [3 is devoted to stochastic approximations for Z{B,f(x)) when log/(x) is 
strictly concave down. In Sections [8] and [9l we extend our exploration of Z{B,x\) from star trees to binary 
trees and define a similar partition function. 


2. Computational complexity 


While P and NP are complexity classes for decision problems, the following cla sses a re for counting 
problems. The classes #P, #P-hard, and #P-complete were first defined by Valiant ( 19791) . The definition 
for ^P that we give here, while not the original, is an equivalent definition. 


Definition 2.1 fWelsh [iQQStl . The class ^P contains those functions / : E* ^ N, for some alphabet E, 
such that both of the following hold: 

• There is a polynomial p, a relation R, and a polynomial time algorithm whieh, for eaeh input w G E* 
and each y G E* with \y\ < p{\w\), determines if R{w,y). 

• For any input w, f{w) = |{2/ : |i/| < pd^l) and R{w, 2/)}|. 


Definition 2.2 (Valiant Il979ll . A counting problem is in ^P-hard if there is a polynomial time reduction 
to it from every problem in #P. A counting problem is in ^P-complete if it is in #P and is in #P-hard. 


Next we give a few known computational complexity results. To state these result, we establish some 
terminology. 

Conjunctive normal form (CNF) is a standard format in which to express Boolean formulas. A 3CNF is 
a Boolean formula F which is the conjunction of clauses and each clause is the disjunction of 3 literals. A 
3CNF, F, with n variables {vi,V 2 , •. •, u„} and k clauses takes the form F = ci A C 2 A ... A Cfe where each 
Ci is a clause which is the disjunction of three literals and the literals are from U {W}"=i- Because 

F was said to have n variables, we may assume that, for each j G [n], Vi or FI appears in some clause of F. 
Each Vi is a positive literal while each uf is a negative literal. The negative literal vl is the negation of Vi. 
We identify vl with the literal Vi and we refer to {vi}^^ as the variables of F and always assume that the 
set of variables has an ordering. 

A truth assignment for F is a function / : —>■ {T, F}” which assigns a value of true or false to each 

variable. If a truth assignment makes F true, we say it satisfies F. Otherwise, a truth assignment does not 
satisfy F in which case there is at least one clause which is not satisfied. 

Definition 2.3 (3SAT). Given an arbitrary F in 3CNF with n variables and k clauses, decide if there is a 
truth assignment for F which satisfies F. 


Definition 2.4 (^3SAT). Given an arbitrary Boolean formula F in 3CNF with n variables and k clauses, 
count the number of truth assignments which satisfy F. 


Theorem 2.5 (Cook llQTl b 3SAT G NP-complete. 


Theorem 2.6 (Valiant Il979l l. #3SAT G #P-complete. 


Define D3GNF to be the subset of 3CNF containing only those F = Aie[fe] c-i such that for each i G [fc], 

• Ci contains three distinct literals, and 

• Ci does not contain both Vj and vj for any j G [n]. 

This defines the following two problems. 


Definition 2.7 (D3SAT). For an arbitrary F in D3GNF with n variables and k clauses, decide if there is a 
truth assignment which satisfies F. 
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Definition 2.8 (^D3SAT). For an arbitrary T in D3CNF with n variables and k clauses, count the number 
of truth assignments which satisfy F. 

The following two results are proven through reductions from #3SAT and 3SAT. 

Lemma 2.9. #D3SAT G #P-complete. 

Proof. This is a reduction from ^3SAT. Let T be a 3CNF with n variables and k clauses, n > 3. Let 
Va,vp,Vry be literals in T with a ^ /3 ^ 7 ^ a. Observe that each of the following pairs have the same 
satisfying truth assignments. 

{Va V U /3 V Vg ) and {Va V U /3 V P-y) A {Va V U /3 V Wf ). 

{Va y VaV Va ) and {Va V U,g V V . f ) A [Va V V U.y) A (Va V 11/3 V Wj ) A {Va V V Vif ). 

Further, a clause of the form [va VulT V U/j) is alway true, so it can be removed. 

Making these replacements in F will result in a D3CNF F' with n' variables [n' < n) and at most 4fc 
clauses. Because some clauses like (va Vu^V vp) are in F but not in F', it is possible that n' < n. 

Given a satisfying truth assignment for F', we may extend it to a satisfying truth assignment for F in 2"’“" 
ways. This is because the variables in F which are not in F' do not affect the ability of a truth assignment 
to satisfy F. On the other hand, each satisfying truth assignment for F, restricted to the variables of F', will 
be a satisfying truth assignment for F'. □ 

Lemma 2.10. D3SAT G NP-complete. 

Proof. As described in the last proof, for any 3CNF F, there is a corresponding D3CNF F' which is com¬ 
putable in polynomial time such that F' has at least one satisfying truth assignment exactly when F has at 
least one satisfying truth assignment. □ 

Next, we return our attention to the partition functions which were introduced in Section [TJ The com¬ 
plexity results in this paper address subquestions and analogues of the following problems. 

Definition 2.11 (^^SPS). Given a tree T and a labeling (p of the leaves of T with binary strings, #SPS 
(Small Parsimony Substitution ) asks for the exact number of most parsimonious scenarios where the scenario 
on an edge is an ordering of the substitutions that must take place to transform the median sequence into the 
sequence at the leaf. 

Definition 2.12 (y^SPSCJ). Given a tree T and a labeling ip of the leaves of T with binary strings repre¬ 
senting genomes under the SGJ model, #SPSCJ fSmall Parsimony Single Cut-or-Joinj asks for the exact 
number of most parsimonious scenarios where the scenario on an edge is an ordering of the cuts and joins 
that must take place to transform the median sequence into the sequence at the leaf such that the sequence 
produced after each cut or join represents a valid genome. 

Clearly, #SPS is a special case of #SPSCJ, the case when there is no conflict in the adjacencies present 
in the genomes assigned to the leaves of the evolutionary tree. While Z{B,x\) is only an upper bound for 
an instance of #SPSCJ, it is the exact answer to each instance of #SPS. 

Lemma 2.13. The problem of calculating Z[B,x\) is in #P. 

Proof. The input includes a multiset B = where each is from {0,1}". Viewing this in the sense 

of phylogenetic trees, a witness consists of a median p to label the center of the star and a scenario to label 
each edge of the tree. The size of the input is 0{mn). 

Recall that a median p' minimizes the quantity H(vi, p'). We can find a single median p' in 0(mn) 
time by examining the coordinate of each string in B and making the coordinate of p' the value that 
appears in a majority of the strings in B, breaking ties arbitrarily. To verify that the given binary string p 
is indeed a median, we need only compare H{vi,p) and H{vi,p'). If they are the same, then p 

is a median. For each edge, we can verify that the given permutation is a scenario for that edge in 0{m) 
time by comparing the bits of p and Vi. By Definition 12.11 ^^SPS is in ^^P. □ 

Most of the complexity results are reductions from ^^DSSAT. In other words, given a D3CNF F with 
n variables and k clauses, we create a multiset of m binary strings of length 2n 1 (where t and m are 
polynomials of n and k) to label the leaves of the tree. These strings will be chosen so that the number of 
most parsimonious substitution scenarios is related to the number of satisfying truth assignments for F. 
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3. SeT-UP for RESULTS ON STAR TREES 

The discussion in this section and the next is specific to Z{B, x!). The first section details some tools and 
constructions that will be needed for the proof of our main result in SectionlH This main result, Theorem l4.41 
states that computing Z{B,x\) is #P-complete. 

The proof of Theorem 14.41 will define a polynomial reduction from AT (Definition 12.81) to compute 

Z{B,x\). Fix an arbitrary D3CNF, F, with n variables and k clauses. Fix a prime p < 5max{300,n + 5} 
which will be utilized later. 

Our task is to define a multiset of binary strings T’(p) to encode F. The multiset T’(p) will be chosen so 
that the set of medians A4{'D{p)) will have a list of desired characteristics. First, each string in V^p) and 
each median in M.{'D{p)) will have length 2n + t with coordinates 

(^1, yi, ^2, y2 , Un , Cl, C2 ? • ■ • 5 Ci ) 

where n is the number of variables in F and the t is a polynomial of n and k which will be defined later. 
Second, M{p{p)) will be the set of all binary strings p of length 2n + t that have p[ei] = 0 for each i £ [t]. In 
other words, V{p) will be defined so that M.{V{p)) equals {0,1}^" x {0}*. Let A4'{'D{p)) denote the subset 
of M{V{p)) with the additional property that p[xi] ^ p[yi\ for all i £ [n]. Once we have established that 
M{'D{p)) = {0, X {0}*, we can conclude M'{'D{p)) = {01,10}" x {0}L This allows for a connection 
with truth assignments for F. 

Definition 3.1. Let n £ . For arbitrary F in D3CNF with n variables, let S be a multiset of binary 

strings on the coordinates {xi,yi,... ,Xn,yn,ei,... ,et). There is an injective function f which assigns to 
each median p £ A4'{S) a truth assignment for F. In particular, f{p) will assign a value of true to the 
variable o/F if p[xi] = 1 and false if p[xi] = 0. 

Remark 3.2. If multiset S is chosen so that M'{S) = {01,10}" x {0}*, then Definition \ 3. 1\ vrovides a 
bisection between A4'{S) and the truth assignments for F. 

Definition 3.3. Let n £ ZF'. Given an arbitrary D3CNF, F, with n variables, let S be an arbitrary multiset 
of binary strings on the coordinates (xi, j/i,..., x„, y„, ei,..., e*) for some t £ Z+. Define A4p(S') to be a 
subset of Ad'{S), containing only those medians which, through the bijection in Definition \3A\ correspond 
to a satisfying truth assignment for F. Since a single clause c in T is also a D3CNF, this defines M'^{S) as 
well. 

To calculate 

Z{V{p),x\)= ^ iJ(y,u0!, 

fj,eM{'D{p)) ie[m] 

we first calculate nie[m] H{p,Vi)\ for each median p £ M.{V{p)). The multiset V{p) will be constructed so 
that there is a constant K{p) (specified in Claim IT75)) which is not a multiple of p and such that for any p £ 
n*e[m] Hip, ly^)l = Kip). Each median p £ Al(D(p)) \ Al}(T>(p)) will have {lieH ® 

mod p. As a result 

E n IIip,Vi)\ = \M'Y-iVip))\Kip) mod p. 

UeM{'D{p)) ie[m] 

Repeating this construction for sufficiently many primes p < 5max{300,n + 5}, we obtain enough con¬ 
gruences, which together with the knowledge that there are at most 2" satisfying truth assignment for F, 
uniquely determine the size of Al}(X>(p)) which is equal to the number of satisfying truth assignments for 

F. 

Later we will see that the main work goes into developing a multiset Dip) with the property that for any 
p £ M'^iDip)) and any p' £ AA'iDip)) \ M'^iDip)) have 

n Hip,v,)\^ n Hip',v,)\. 

i^[rn\ i^[rn\ 

In Section IXTl we define the strings Dip) which are used in the proofs to distinguish medians in A4p(X>(p)) 
from medians in A4'(X>(p)) \ Al}(I?(p)). 
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3.1. Encoding Boolean clauses in binary strings. A truth assignment satisfies F if and only if it satisfies 
every clause in F. Hence, we will encode each clause Ci of F in a set of 50 strings 

Ci '■= 1^2, ■ ■ ■ , U5q} 

which will be defined through Table[I] These 50 strings are designed to distinguish those medians in (Ci) 
from those in A4'(Ci)\A4[,. (Ci). Confirmation of this will come in Section lSTSl Because every truth assignment 
that does not satisfy F has at least one clause in F that is does not satisfy, we will see that the disjoint union 
WiG[fe] distinguishes between (l+Ji6[fe] C*) and M' (l+)ig[fe] \ M'r (l+J*6[fe] Ci) • 

The following definition gives a guide for defining a multiset of binary strings. 

Definition 3.4 (Defining strings). For arbitrary m,n G Z’*' and t G Z"*" U {0}, to define a multiset of binary 
strings {?7i, 772 , ■ ■ ■ ,r]m} on coordinates (xi,?/i,...,x„,?/„, ei,..., et), it suffices to 

• define rjj [xi] and rjj [y^] for each j G [m] and i G [n], and 

• define a function e : [m] —>■ Z+ U {0}. 

We say rjj has e{j) additional ones. In order to infer the values rjjlei] for each j G [m] and £ G [t], follow 
this procedure: 

Partition [t] into subsets E, Ei, E 2 , ■ ■ ■, Em so that the size of Ej is precisely e(j), and E = [t] \ 
Uje[m] ^ Vo] ond each £ G [t], set Pjiei] = 1 if and only if £ G Ej and rjj'le^] = 0 

otherwise. 

Remark 3.5. Let m G IF. For an arbitrary multiset {rjjffLi of binary strings built using Definition \S.4\ 
for each £ G [t], there is a unique j G [to] such that r]j[e£] = 1. Consequently, each p, G Ai({r]j}JLi) will have 
= 0 for all £ G [t] because p must minimize J2j^[m] H{rij,p). 

Definition 3.6. Let n G 'IF and t G IF U {0} be arbitrary. Two binary strings rj and rj with coordinates 
(xi, 7/1,..., Xn,yn, Cl,..., Ct), are said to be complementary on the first 2n coordinates if r][xi] = 1 — 77[xi] 
and r][yi] = 1 —fi[yi] for each i G [nj. 

The following fact will be useful. 

Fact 3.7. Let 77 and rj he binary strings on coordinates 

(^1 ; 7 / 1 ; ■ • ■ ; ; 7 /n; Cl, . . . , cf) . 

Set efq) := X]ie[t] number of additional ones in 77 . Define eijj) similarly. If rj and rj are comple¬ 

mentary on the first 2n coordinates, then for any p G {0, 1}^" x { 0 }S 

H{p,v) H{p,rj) = 27 ^ + 6 ( 77 )+ 6 ( 77 ). 

Proof. For each i G [tt.], either 7 i[xi] = 77 [xi] or p[xi] = ^[xi], but not both. This is also true for each yi. This 
accounts for the 2n in the sum. Because p[ei\ — 0 for all i G [t], each i G [t] with 77 ( 6 ^] = 1 will contribute 
one to the sum. Also each i G [t] with 77 [ei] will contribute one to the sum. This completes the proof. □ 

Definition 3.8. Given an arbitrary multiset of binary strings S, we say that a coordinate s is ambiguous 
if there are exactly d|5'| binary strings rj G S, counted with multiplicity, such that 77 ( 5 ] = 0. Consequently, if 
you change the value of a median at an ambiguous coordinate, you obtain another median. Note that if \S\ 
is odd, then there are no ambiguous coordinates and there is exactly one median. 

Fact 3.9. Let S be a multiset of binary strings which are defined on the coordinates 

( 3 I 1 , 7 / 1 , . • . , Xn ,7/njCi,...,ei). 

If S can be partitioned into pairs of strings where the two strings in a pair are complementary on the first 
2n coordinates, then each Xi and each yi is an ambiguous coordinate. 

Fix an arbitrary D3CNF, F, with n variables and k clauses. Fix a clause q in F. For this clause, we are 
now ready to define a set of 50 strings 

Ci = {Ui, ^ 2 , . . . , Ujq}. 

First assume that Ci = Va^/ vp W v.y, a. disjunction of three positive literals. Because F is a D3CNF, we may 
assume a < /3 < 7 . 

For each j G [50], we will supply the following three pieces of information for uj: 
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(a) The values for u* [a;o,], ^*[ 1 /^], u* u® [?/^] will be explicitly defined. 

(b) A constant G {0,1} will be given so that = Ky for all £,£' G [n] \ {a,/?, 7 }. 

(c) The string will be assigned some number of additional ones. 

By Definition 13.41 this is sufficient to explicitly define . 

In Table [ 1 ] there is a row for each string in Ci. The three defining pieces of information are found in 
Columns A, B, and C respectively. The remainder of the table will be explained in Subsection 13.21 

For each j G [50], row j of Table [T] supplies the three ingredients needed to define p® . By matching the 
6 -bit string in Column A of row j with 

(u® [Xa], v] [ya \, v] [xp ], v] [yp ], v] [x.^], v] [y^]) 

we obtain the 6 values for (a). The constant Ky for (b) is found in Column B of row j. For (c), the number 
of additional ones in u® is found in Column C of row 7 . 

With a slight modification in the reading of Column A, the 50 rows of Table [T] will also supply the 50 
strings for a clause which contains negative literals. Fix an arbitrary clause Ci in F which now may have 
negative literals. For each j G [50], the definition of string p® will again be based on Columns A, B, C of 
row j in Table [T] where the same information will be gleaned from Columns B and C. The only difference is 
with Column A which will be explained next. 

If Ci contains the variables Va,vp,v-y where some of these may be present as negative literals, set Si := 
{xa,ya,xp,yp,x^,y^}. We call Si the support set of q. Clause Ci must be one of the 8 clauses listed in 
Column A of Table [5] For j G [50], u® is defined on the coordinates Si by matching the entry in the right 
column of the Ci row of Table [2] with the 6 -bit string in Column A of the row of Table [TJ 

Example 3.10. For an example, when Ci = Va V Up \/ v^, the last row of Table{^ says that the string v^q 
must have 

Ko[2;«],U^o[2/a]:^5o[2//3]:l^5o[2^/3],i"5o[?/7]:^5oK]) = ( 101010 ). 

Therefore, i^lolxa] = I, Usoba] = 0, = 0, t'sob/?] = = 0, and Ugob^] = 1. Further, Column 

B implies vIq{xi) = = 1 for all ^ G [n] \ {a,/3,7} and, from Column C, vIq will have 2 additional 

ones. 

Now that we have defined Ci for any clause Ci, let us analyze A4(Ci). By Fact 13.51 for every p G A4(Ci) 
and £ G [t], p[ef\ = 0. 

In Column B of Table [I] it is evident that for any £ G [n] \ {a,/ 3 , 7 }, the number of strings u® with 
I'jlxe] = 0 is 25 = ^|Ci|. Therefore, by Definition 13.81 the coordinates x^ and ye are ambiguous. Through 
careful inspection of the strings in Column A of Table [U we see that coordinates xe' and yt are also 
ambiguous for each £' G {a, /3, 7 }. Therefore we have proven the following fact, which was one of our goals: 

Fact 3.11. For an arbitrary clause Ci with three distinct variables, 

M(C,) = { 0 ,l}^®®x{ 0 }‘. 

Remark 3.12. By visual inspection of Table]^ the binary strings Ci can be partitioned into pairs where the 
two strings in a pair are complementary on the first 2n coordinates. 

3.2. Hamming distances between Ci and possible medians. Here we explain the remainder of Table[I] 
Fix a clause Ci in F which will be used throughout this subsection. Suppose Ci has variables Va, vp, and 
v.y. By Fact 13.111 M.{Ci) = {0,1}^®® x {0}*. Therefore, Ai'(Ci) must be equal to {01,10}®® x {0}*. For this 
subsection, define 

M:=M(C,), M':=M'[C,). 

Define an equivalence relation on Ai' such that two medians are equivalent if they agree on the 
coordinates in the support set Si of ci. The result will be 8 equivalence classes because p{xe\ p{ye\ for each 
£ G {a, (3, 7 } for each p G At'. 

Here we define a one-to-one correspondence between the equivalence classes of A4® under and the 6 -bit 
strings heading Columns Ml through M 8 in Table [T] 
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Table 1. The 50 strings in Ci for a single clause Ci along with their Hamming distance from 
medians in 




A 

B 

C 

Ml 

M2 

M3 

M4 

M5 

M6 

M7 

M8 


Row 

# 

Values of 
on its 

support set 

’'jivf] 

[ve Ci) 

Add’l 

Ones 

101010 

10 1001 

100110 

0110 10 

1001 01 

011001 

010110 

010101 


1 

010000 

0 

+3 

n -f 4 

n + 4 

71 + 4 

71 + 2 

71 + 4 

71 + 2 

71 + 2 

71 + 2 

+ 

2 

000100 

0 

+3 

n + 4 

71 + 4 

71 + 2 

71 + 4 

71 + 2 

71 + 4 

71 + 2 

71 + 2 


3 

000001 

0 

+3 

n + 4 

71 + 2 

71 + 4 

71 + 4 

71 + 2 

71 + 2 

71 + 4 

71 + 2 

o' 

4 

101111 

1 

+0 

n. — 1 

71—1 

71 — 1 

71+1 

71—1 

71 + 1 

71 + 1 

71+1 

+ 

5 

111011 

1 

+0 

n—\ 

71—1 

71 + 1 

71—1 

71+1 

71 — 1 

71 + 1 

71+1 


6 

111110 

1 

+0 

n — 1 

71+1 

71 — 1 

71—1 

71+1 

71 + 1 

71 — 1 

71+1 


7 

10 1000 

0 

+2 

n 

71 

71 + 2 

71 + 2 

71 + 2 

71 + 2 

71 + 4 

71 + 4 


8 

100010 

0 

+2 

n 

71 + 2 

71 

71 + 2 

71 + 2 

71 + 4 

71 + 2 

71 + 4 


9 

001010 

0 

+2 

n 

71 + 2 

71 + 2 

71 

71 + 4 

71 + 2 

71 + 2 

71 + 4 


10 

101000 

0 

+2 

n 

71 

71 + 2 

71 + 2 

71 + 2 

71 + 2 

71 + 4 

71 + 4 


11 

100001 

0 

+2 

n-\- 2 

71 

71 + 2 

71 + 4 

71 

71 + 2 

71 + 4 

71 + 2 


12 

001001 

0 

+2 

n + 2 

71 

71 + 4 

71 + 2 

71 + 2 

71 

71 + 4 

71 + 2 


13 

100100 

0 

+2 

n-\- 2 

71 + 2 

71 

71 + 4 

71 

71 + 4 

71 + 2 

71 + 2 


14 

100010 

0 

+2 

n 

71 + 2 

71 

71 + 2 

71 + 2 

71 + 4 

71 + 2 

71 + 4 

cs 

- 1 - 

15 

000110 

0 

+2 

5 

71 + 4 

71 

71 + 2 

71 + 2 

71 + 4 

71 

71 + 2 


16 

011000 

0 

+2 

n-\-2 

71 + 2 

71 + 4 

71 

71 + 4 

71 

71 + 2 

71 + 2 

17 

010010 

0 

+2 

n-\- 2 

71 + 4 

71 + 2 

71 

71 + 4 

71 + 2 

71 

71 + 2 


18 

00 1010 

0 

+2 

n 

71 + 2 

71 + 2 

71 

71 + 4 

71 + 2 

71 + 2 

71 + 4 

H 

19 

100100 

0 

+2 

n + 2 

71 + 2 

71 

71 + 4 

71 

71 + 4 

71 + 2 

71 + 2 

20 

100001 

0 

+2 

n + 2 

71 

71 + 2 

71 + 4 

71 

71 + 2 

71 . + 4 

71 + 2 


21 

000101 

0 

+2 

n + 4 

71 + 4 

71 + 2 

71 + 4 

71 

71 + 2 

71 + 2 

71 


22 

011000 

0 

+2 

n -f 2 

71 + 2 

71 + 4 

71 

71 + 4 

71 

71 + 2 

71 + 2 


23 

010001 

0 

+2 

n + 4 

71 + 2 

71 + 4 

71 + 2 

71 + 2 

71 

71 + 2 

71 


24 

00 1001 

0 

+2 

71 + 2 

71 

71 + 4 

71 + 2 

71 + 2 

71 

71 + 4 

71 + 2 


25 

010100 

0 

+2 

71 + 4 

71 + 4 

71 + 2 

71 + 2 

71 + 2 

71 + 2 

71 

71 


26 

010010 

0 

+2 

71+2 

71 + 4 

71 + 2 

71 

71 + 4 

71 + 2 

71 

71 + 2 


27 

000110 

0 

+2 

71+2 

71 + 4 

71 

71 + 2 

71 + 2 

71 + 4 

71 

71 + 2 


28 

101011 

1 

+1 

71—1 

71—1 

71 + 1 

71+1 

71+1 

71 + 1 

71 + 3 

71 + 3 


29 

101101 

1 

+1 

71+1 

71—1 

71 + 1 

71 + 3 

71—1 

71 + 1 

71 + 3 

71+1 


30 

111001 

1 

+1 

71+1 

71—1 

71 + 3 

71+1 

71+1 

71 — 1 

71 + 3 

71+1 


31 

100111 

1 

+1 

71+1 

71+1 

71 — 1 

71 + 3 

71—1 

71 + 3 

71 + 1 

71+1 


32 

101110 

1 

+1 

71—1 

71+1 

71 — 1 

71+1 

71+1 

71 + 3 

71 + 1 

71 + 3 


33 

110110 

1 

+1 

71+1 

71 + 3 

71 — 1 

71+1 

71+1 

71 + 3 

71 — 1 

71+1 


34 

011011 

1 

+1 

71+1 

71+1 

71 + 3 

71—1 

71 + 3 

71 — 1 

71 + 1 

71+1 


35 

011110 

1 

+1 

71+1 

71 + 3 

71 + 1 

71—1 

71 + 3 

71 + 1 

71 — 1 

71+1 

+ 

36 

111010 

1 

+1 

71—1 

71+1 

71 + 1 

71—1 

71 + 3 

71 + 1 

71 + 1 

71 + 3 

s 

37 

loom 

1 

+1 

71+1 

71+1 

71 — 1 

71 + 3 

71—1 

71 + 3 

71 + 1 

71+1 

38 

101101 

1 

+1 

71+1 

71—1 

71 + 1 

71 + 3 

71—1 

71 + 1 

71 + 3 

71+1 


39 

110101 

1 

+1 

71 + 3 

71+1 

71 + 1 

71 + 3 

71—1 

71 + 1 

71 + 1 

71 — 1 


40 

011011 

1 

+1 

71+1 

71+1 

71 + 3 

71—1 

71 + 3 

71 — 1 

71 + 1 

71+1 

41 

011101 

1 

+1 

71+3 

71+1 

71 + 3 

71+1 

71+1 

71 — 1 

71 + 1 

71—1 


42 

111001 

1 

+1 

71+1 

71—1 

71 + 3 

71+1 

71+1 

71 — 1 

71 + 3 

71+1 


43 

010111 

1 

+1 

71 + 3 

71 + 3 

71 + 1 

71+1 

71+1 

71 + 1 

71 — 1 

71 — 1 


44 

011110 

1 

+1 

71+1 

71 + 3 

71 + 1 

71—1 

71 + 3 

71 + 1 

71 — 1 

71+1 


45 

110110 

1 

+1 

71+1 

71 + 3 

71 — 1 

71+1 

71+1 

71 + 3 

71 — 1 

71+1 


46 

010111 

1 

+1 

71+3 

71 + 3 

71 + 1 

71+1 

71+1 

71 + 1 

71 — 1 

71—1 


47 

011101 

1 

+1 

71 + 3 

71+1 

71 + 3 

71+1 

71+1 

71 — 1 

71 + 1 

71 — 1 


48 

110101 

1 

+1 

71 + 3 

71+1 

71 + 1 

71 + 3 

71—1 

71 + 1 

71 + 1 

71 — 1 

+ 

49 

010101 

0 

+1 

71 + 3 

71 + 2 

71 + 2 

71 + 4 

71 

71 

71 

71 — 2 


50 

101010 

1 

+2 

71—1 

71+1 

71 + 1 

71+1 

71 + 3 

71 + 3 

71 + 3 

71 — 3 


For a clause , the left three columns define the 50 strings in Cj. In row j, the 6-bit string gives the values of Uj on the support set 5^ as 
described by Table The second column gives the constant value to be assigned to all xg and which are not in S^. The third column specifies 
the number of extra ones in i/j. The collection {01, 10}^ is listed along the top row. The entry in row j and column i is the number of additional 
ones in Uj added to the Hamming distance between the 6-bit string in row j and the 6-bit string at the top of column 


Definition 3.13. Fix a clause Ci and an integer i € [8]. Consider the 6-bit string 5 which heads column M£. 
In Table [H locate the tuple in the right column corresponding to our fixed clause Ci. After replacing each uj 
with /i in the tuple, match this tuple with S. This gives six values that a median p, G A4' must have if it is 
in the equivalence class represented by the column heading S. 
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Table 2. A key for interpreting Column A 
of Table [T] 


Clause 


Key to interpret Column A of Table [T] 


Va 

V 

vp 

V 

Vy 

KM 

A, 

M] 

Aj[xp\, 

Ai[yp\, 

A,[xj] 

Aj[yj\) 

V^ 

V 

Vp 

V 



a; 

[Xa\ 

Aj[xp], 

Aj[yp], 

iy][x^] 

Ajiyj]) 

Va 

V 

vp 

V 


KM] 

A, 

M] 

Aj[yp], 

Aj[xp], 

iy][x^] 

Aj[yj]) 

Va 

V 

vp 

V 


KM] 

A, 

M] 

Ajlxpl 

Aj[ypl 

Ai[yy] 

Aj[xj]) 

vL 

V 

vp 

V 

V'y 

(I'jlya] 

a; 

\Xa\ 

Ajlyp], 

Aj[xp], 

A^lX'r] 

Aj[y-f]) 

V^ 

V 

vp 

V 



a; 

[Xa\ 

Aj[xp], 

A^iypl 

Ai[yj] 

u®[a;^]) 

Va 

V 

vp 

V 


KM] 

A, 

M] 

Aj[yp], 

Aj[xp], 

A.iy^] 

u®[M) 

V^ 

V 

vp 

V 



a; 


Aj[yp], 

Aj[xp], 

Aj[y^] 

Ajixj]) 


For any clause in the left column, the corresponding entry in 
the right column above will be matched with the 6-bit string 
in Column A of row j of Table [U to determine the value of p® 
at each bit in the support set Si. 


In Definition l3.ll we defined a correspondence between A4' and truth assignments for F. In Definition 13.31 
we introduced the notation A4p(Ci) for the collection of medians in A4' which correspond to satisfying truth 
assignments for F. Similarly, we defined A4'^. (Ci) for each clause Ci in F. For the remainder of this subsection, 
set 

M'r.= M'riC,), := 7W;(C,). 

The following claim uses the correspondence in Definition 13.131 to connect A4' \ A4'^. with a particular 
equivalence class. 

Claim 3.14. Let Ci he a elause in F. For any p, G Al', /i is in the equivalence class represented by Column 
M8 of Table [1] if and only if p. & M' \ ■ 

Proof. Fix a clause Ci with variables Va,vp^v.y. This clause may have some negative literals. We focus our 
attention on Va. The arguments for Vfj and v.y are exactly the same. 

There are two cases depending on whether Va appears as a positive literal or a negative literal in c^. 

In the case where Va appears in Ci as a positive literal, the truth assignment which makes Ci false assigns 
a value of false to Va- A corresponding median p G Ai' has p[xa] = 0 and p[ya] = 1- Because Va appears 
as a positive literal in Ci, the entry in the second column of Table[2]has p[xa] followed by p[ya]- So, in this 
case, the 6-bit string which heads the column for medians in A4' \ Ai'^. has 01 in the first two entries. 

In the case where Va appears as a negative literal in Ci, the non-satisfying truth assignments for Ci must 
have Va true. The corresponding medians p G A4' will have p{xa] = 1 and p[ya\ = 0. For the clauses with 
variable Va appearing as a negative literal in Ci, a quick glance at Table [2] reveals that p{ya] immediately 
precedes /i[a;ct] in the 6-bit column headings in Table [H As a result, the column representing medians in 
At' \Ai'i,. has 01 in the first two entries. 

Repeating this argument for vp and v.y, we see that medians in A4' \ A4'^. are represented by the column 
with heading OIOIOI. □ 

Now that we have defined the rows and columns of Table [U we conclude this subsection by defining the 
entries within Table [T] for fixed clause Ci. 

Let p G Ai' be an arbitrary median that falls into the equivalence class represented by Column for 
some £ G [8]. The entry aje in Row j and Column M£ of Table [1] is H{p, u®). This value can be calculated 
as follows: 

• First, take the Hamming distance between the 6-bit string in Column A of Row j and the 6-bit 
string in the header of Column M£. This is equal to the Hamming distance between the restrictions 
of p and u® to the support set Si for c^. 

• For any s ^ {q;,/3,7}, /J.[a;s] ^ p[ys] and u®[a;s] = Therefore the Hamming distance between 

{p[xs],p[ys]) and {Cj[xs],Vj[ys]) is I for each s G [n] \ {a,/3,7}. 
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• Finally, because /J.[es] = 0 for all s € [f], the Hamming distance between the restrictions of /r and u® 
to the coordinates (ci, 62 ,..., et) is the number of additional ones in u® which is found in Column C 
of Row j. 

Adding these three values together gives the entry aj£. 


3.3. Distinguishing the satisfying truth assignments. Fix a clause Ci in arbitrary D3CNF F. For this 
subsection, we again set At' := M'y and ■= For each fj, G M' \ 

/r is in the equivalence class represented by Column M8 according to Claim [3H4l Then reading the entries 
in Column M8 of Table [I] we find the multiset (where parenthetical subscripts give the multiplicity of that 
value in the multiset): 

{i/(/r,u®) : j G [50]} = {{n - 2)(i),(n - l)( 6 ),n( 3 ), (n + l)(i 5 ), 

(n + 2 )(i 5 ), (n + 3)(3), (n + 4)(6), (n + 5)(i)}. (1) 

Otherwise, for each median /i G At}., /r is in one of 7 equivalence classes represented in Columns Ml through 
M7. The entries in each of these columns yields 

Vj) : j G [50]} = {{n — 1)(7), n(6), (n + l)(i 2 ), (n + 2)(i2), {n + 3)(6), (n + 4)(7)}. (2) 

Therefore,we can use Ci to distinguish between the medians in At}^ and the medians in At' \ At}^. For 
example, given /r G At' = {01,10}” x {0}*, if we determine that (n + 5) G {i7(/i, u}) : j G [50]}, then we can 
conclude ^ G M' 

Now we wish to consider all of the Ci multisets together. It is clear that each Xi and each yi coordinates 
will remain ambiguous in the multiset For the additional ones, we will take t large enough to 

maintain the property that, for each i G [t], there is at most one binary string rj in Ci with ? 7 [ei] = 1. 

As a result, 

A4 f y CM ={0,l}”x{0}‘. 

\i€[k] J 

Further, 


A4}^ :=A4},(C.)=A4}^ 1+1 ^® > 

ViG[fe] / 

M'r := A4}(C,) = M'r I 1+J CM . 

\i&[k] J 

By definition of the sets A4}. and Al}, 

Af} = n Ai}„ 

Af' \ Al} = Al' \ Pi A(}^ = U {M' \ A4p . 

iG[k] 

Therefore the multiset 1+Jjgj^.] Ci will serve as a tool to distinguish Al} from Al' \ Ad}. 


(3) 

(4) 


4. Complexity of computing Z{B,x\) 

Before stating Theorem 14.41 we need a result which is equivalent to the Prime Number Theorem. Define 

•= Yl 

p<ai 
p prime 

Theorem 4.1. 9{x) ^ x. 

As a result, the next lemma and corollary hold. 
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Lemma 4.2 (Rosser [1941). For 2 < x, 


1 - 


2.85 

logx 


X < 9{x) < 



2.85 \ 
logx j ^ 


Corollary 4.3. For any n > 300, 

e"/'< n 

p<n 
p prime 


Now we can prove the main result for star trees. 


Theorem 4.4. Calculating Z{B,x\) is #P-complete. 

Proof. In Lemma r2.13l we verified that calculating Z{B,x\) is in #P . To show #P-complete, we give a 
polynomial time reduction from ^D3SAT. Fix an arbitrary D3CNF F = ci A C 2 A ... A Cfe where each Ci is a 
clause and F has n variables. 

Using the bound in Corollary 14.31 let n' = max{300, n + 5}. Fix a prime number p which is greater than 
n' and at most 5n'. Let 

q:= p — (n-\- 5). 

We will explicitly define a multiset 

V{p) = A{p) U IJ S*(p) U IJ Ciip) 

iG[n] iG[k] 

consisting of 2 + 2n + 50fc binary strings with coordinates 


where 


(^1 ; yi: ^2 1 y2i ■ • ■ 1 ^n: Vn: Cl , . . . , ) 


t{p) := 2{q + 4) + 2n{q + 3) + k{75 + 50q). (5) 

The coordinates ei, 62 ,..., et(p) are for the additional ones. In order to define each rj € 1){p), we will give 
exact values for r][xj] and r][yj] for each j G [n] and specify the number of additional ones that rj will have. 
Definition 13.41 tells how to obtain the values of ri[ej] for each j G [t(p)] from this information. 

All strings in 'D{p) will come in pairs which are complementary on the first 2n entries (Definition 13.81) . 
As a result, we can use Fact 13.91 to see that each of the first 2n coordinates are ambiguous in V{p). 

Now we begin defining the strings in multiset that together create Pip). The set A{p) consists of two 
strings, a and a. Define a to have a[xi\ = oi\yi] = 1 for all i G [n] and q + 4 additional ones. Define a to be 
complementary to a on the first 2n entries and have q + 4 additional ones. 

For each j G [n], the set Bj{p) will consist of two strings, fij and fdj. Define fdj to be the string with 

f € N with j' ^ j, I3j[xji\ = I3j[yji] = 0 and q + 3 additional ones. Define fij 
to be complementary to Pj on the first 2n entries and have q + 3 additional ones. 

For each i G [fc], the set Ci{p) will have 50 strings. These are obtained by adding q more additional ones 
to the 50 strings in Ci which were defined through Table [T] (see Section l+Tll . In other words, increase each 
entry in Column C of Table [1] by q to obtain Ci{p). 

In summary, we have constructed the strings 

V{p) := A{p) U IJ S,(p) U IJ C,(p). 

ie[n] ie[fc] 

As described in Definition 13.31 and for each clause Ci in F, set 

M(p) := M{V{p)), M'ip) := M'{V{p)), 

Mf^P) ■■= Mf^mp)), M'Ap) := M'APip))- 

As stated in Fact each pL G M{p) has /r[ej] = 0 for all j G [t{p)]. Additionally, because all of the strings 
in 'D{p) come in complementary pairs, the coordinates Xj and yj are ambiguous for each j G [n] (Fact 
Thus there are 2^" medians pL. More precisely, 

M{p) = {0,1}2" X {0}*(P) and 
M'{p) = {01,10}" X {0}*(P). 


(6) 
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Define 

Hifi, A{p)) := 

aGA(p) 

and likewise define H(/i, Sj(p)) and Ci(p)) foreachj G [n] and i € [fc]. Therefore the number of scenarios 
admitted by median p can be expressed by 

nip) := 'H{p,Aip)) ■ 'H{fi,Ci{p)). 

ie[rt] ie[k] 

At this point, we wish to calculate 'H(^) mod p for each median p G M{p). To analyze 'H{p) for each 
p G A4, we define the following 3 properties that a median p G M.(p) may have. 

Property 1. J2^e[n](.B[x^] + p[yi\) = n. 

Property 2. p G M'{p). 

Property 3. p G M'^ip)- 

First notice that these properties are nested. Any p G M{p) with Property 2 must also have Property 1. 
Likewise, if p has Property 3, it will also have Property 2. The next 4 claims divide Ai{p) into 4 classes and 
examine for medians in each class. 

Claim 4.5. For arbitrary p G if p does not have Property 1, and consequently does not have Property 

2 or 3, then %{p) = 0 mod p. 

Proof. Let p be an arbitrary median in M{p). For a G A{p), Fact 13.71 gives 

F[{p, a) + H{p, a) = 2n + {q + A) + {q + 4:) = 2p — 2. 

Hence, there is an integer r such that q + A < r < 2n + q + A and P[{p, a) = r with 

PLip, A(p)) = r!(2p — 2 — r)\. 

Since p does not have Property 1, we can conclude that exactly one of the following holds: 

p[{p, a) > (n + 1) + (<7 + 4) = p, or 

H{p,a) > {n + A) + {q A- A) = p. 

Therefore, either r > p or (2p —2 —r) > p. In the first case, r! is divisible by p and, in the second, (2p —2 —r)! 
is divisible by p. Therefore 'H{p, A{p)) = 0 mod p and consequently = 0 mod p. □ 

Claim 4.6. For an arbitrary p G if p has Property 1, but does not have Property 2, then 'H{p) = 0 

mod p. 

Proof. Suppose p G M{p) \ M'[p) but p has Property 1. Because p 0 M'{p), there is an integer jo G [n] 
such that p[xjg] = p[yjo]- In the case when p[xjg\ = 0, we have H{p, ) = (n + 2) + (g + 3) = p. Otherwise 

p[xi] = 1 which implies H{p, Pj^) = (n + 2) + {q + 3) = p. In either case, 

'Hip,Bjg{p)) =p!(p-4)! 

and consequently = 0 mod p. □ 

Claim 4.7. For an arbitrary p G M{p), if p has Properties 1 and 2, but does not have Property 3, then 
PL{p) = 0 mod p. 

Proof. Let p be in M'{p) \ A4p(p). Since p corresponds to a truth assignment which does not satisfy F, 
there is a clause Ci^ in F which is not satisfied by this truth assignment. Therefore p G M'{p)\M'f..^ (p). By 
o, before adding the q additional ones to each string from Ci^ , we have 

{H{p,Ap : vf G C,o} = {{n - 2)(i),(n - l)( 6 ),n( 3 ), (n + l)(i 5 ), 

(n + 2)(i5), (n + 3)(3), (n + 4)(6), (n + 5)(i)}. (7) 

To create Cig (p), we added q additional ones to each string in Cig which increased each Hamming distance 
by q. Therefore 

{H{p, vf) : lyf G C^Jp)} = {(p - 7)(i), (p - 6 )( 6 ), (p - 5)( 3 ), (p - 4)( 15 ), 

(P - 3)(15), {P - 2)(3), (p - l)( 6 ),P(l)}. 
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As a result, 

= (p- 7)!(p- 6)!®(p- 5)!^(p- 4)!i^(p- 3)!^®(p- 2)!3 (p- 1)!V 
which is divisible by p. Therefore = 0 mod p. □ 

Claim 4.8. For an arbitrary p G A4{p) having Properties 1, 2, and 3, the value 

n{p) = {p- 6)!^'=(p - 5)!®'=(p - 4)112'= (p _ 3)112/0_ 2)!6 /o+2»(p _ iyJk+2^ 
which is not congruent to 0 modulo p. 

Proof. Let p G M'p{p). Because it has Property 1, 

n{p,Aip)) = (n + (g + 4))!2 = (p- l)!^. 

Since p has Property 2, for any i G [n], 

n{p,B^{p)) = (n + (g + 3))!2 = (p- 2 )! 2 . 

Finally, p satisfies Property 3 which means p G (p) for all clauses Ci in P. 

Recall that each string 77 G Ci(p) is created from a string 77' G Ci by adding q more additional ones. 
Therefore H{p,r]) = P[{p,T]') + q. So, the multiset 'H{p,Ci{p)) can be obtained from 'H{p,Ci) found in ([ 5 ]) 
by adding q to each element. As a result, 

n{p, c.(p)) = {p- 6)!^(p - 5)\Hp - 4)!i2(p - 3)!i2(p _ 2 )\^{p - 1 )!^. 

Therefore 

n{p) = {p- 6 )!^'=(p - 5)!®'=(p - 4)112'= (p _ 3 )!i 2 /c(p _ 2 )! 6 '=+ 2 »(p _ iyJk+ 2 _ ^g) 

Because p is prime. Hip) ^ 0 mod p. □ 

Set 

T{p):= Y. 

U&Mip) 

Set K(p) equal to the function of p displayed in dH) . Thus K {p) is precisely the value of the number of SC J 
scenarios admitted by an arbitrary p G A4p(p). If we calculate T{p) mod p, the four claims show that 

^(P) = = \A'irip)\-K{p) modp. (9) 

AieAip(p) 

If 7 is the number of satisfying truth assignments for P, then 7 = |AIp(p)| by Definition 13.31 Therefore 

7 • Ar(p) = T(p) mod p. 

Since p does not divide K{p) (Claim IT51) . there exists an integer K'{p) such that K{p) ■ K'{p) = I mod p. 
Thus 

'y = K'{p) ■ T{p) mod p. 

While this alone is not sufficient to determine the value of 7 , we can repeat this construction for many 
different prime values to obtain more congruences. 

Recall p was fixed to be a prime greater than n' and at most bn'. Repeat the above construction for each 
prime pi,p 2 ,... ,Pm in this range. The result is a list of congruences: 

7 = A:'(pi) • r(pi) modpi, 

7 = A:'(P 2 ) • r(p 2 ) modp 2 , 


'y = K'{pm) ■T{pm) mod Pm. 
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Because pi,p 2 , ■ ■ ■ ,Pm are all prime, the Chinese Remainder Theorem guarantees a solution for 7 which 
is unique modulo Corollary 14.31 


n 

i^[m] 


n ^ 

p<5n^ 
p prime 


jl 


n ^ 

p<n' 
p prime 


~ ^3n'j2 


= e” > e^- 


Since 7 is the number of satisfying truth assignments for T, and there are only n literals which can realize 
one of two values, 7 < 2". Since IliefmlPi > e" > 2" > 7 , the Chinese Remainder Theorem gives the exact 
value of 7 . 

In summary, for D3CNF T with n variables and k clauses, we use the Sieve of Eratosthenes to identify the 
primes between n' and 5n'. This runs in 0{n^) time. Then for each prime p in this interval (which is at most 
max{2n, 600} primes), we create 50A: + 2n + 2 binary strings of length 2n + t(j)) where t{p) is a polynomial 
in n and p with p G 0(n). Finally, the Chi nese R emainder Theorem will solve the system of congruences in 
0(log^(piP2 ■ • -Pm)) time (Bach and Shallit 1 19961 1. For us, this is 0{n^ log^ n) because each prime is at most 
5n and m < 2n. 

Therefore, if we had algorithm to determine the value of Z{B, x\) which ran in time polynomial in the size 
of B and the length of the strings in B, then we have created here a polynomial time algorithm to determine 
the number of satisfying truth assignments for a D3CNF, a problem which is known to be #P-complete. 
This finishes the proof. □ 


5. Stochastic Approximations for Z[B,x\) 

In the previous section, we proved calculating Z{B^ a;!) is a ^P-complete problem. The natural next ques¬ 
tion is whether or not this value can be approximated. Viewing this problem as counting most parsimonious 
scenarios for the star phylogenetic tree with leaves labeled by the strings in R, we are also interested in a 
near uniform sampler of these labelings. 


Definition 5.1. A counting problem in #P has an FPAUS (fully polynomial almost uniform sampler) 
if there is a randomized algorithm such that, for any instance of ffA and any e > 0 , the algorithm outputs 
an element x € X, the solution space for ffA, with probability p{x) where 

\ X! -U{x)\ < e 
xex 

where U is the uniform distribution on X and the algorithm runs in time polynomial in the size of the instance 
of ffA and — loge. 

Definition 5.2. A counting problem ffA in #P has an FPRAS (fully polynomial randomized approximation 
scheme) if there is a randomized algorithm such that, for any instance of ffA and any e,6 > 0, the algorithm 
outputs an approximation f for the true answer f of the counting problem satisfying the following inequality 

P (t^ < / < /(I + e)) > 1 - <5 (10) 

Furthermore, the algorithm runs in time polynomial in the size of the instance ffA, e~^, and — log(5). 


The modulo prime number computation technique which was used to prove that calculating Z{B,x\) is 
in #P-co mplet e has been used to show that other problems are #P-complete. For example, Brightwell and 
Winkler (1199111 used this technique to prove that counting the number of li near extensions of a partially 
order set is ^P-complete. For this same problem, Karzanov and Khachiyan (1199111 found a rapidly mixing 
Markov chain to sample the linear extensions. Since counting the linear extensions of a partially ordered set 
is a s elf-reducible counting problem, this means that it also has an FPRAS (Jerrum, Valiant, and Vazirani 


198611 . This may suggest that our problem of counting most parismonious scenarios also has an FPAUS and 


FPRAS. However, here we give a straightforward Markov chain to sample the most parsimonious scenarios 
that turns out to be torpidly mixing, suggesting that our problem may not have an FPAUS. With evidence 
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for both the positive answer and the negative answer, the question of whether or not there is an FPAUS and 
FPRAS for Z{B,x\) remains open. 

Recall that a median p, for R = {uj}™ minimizes Therefore, in the bit, the value of 

p, must agree with a majority of the strings in B. If exactly half of the strings in B have a 1 in the bit, 
then /i may take either a 0 or a 1 in the bit. We call such a bit an ambiguous bit. Therefore a median for 
B is determine by the value it takes in the ambiguous bits. As a result, if B has an odd number of strings, 
then there is exactly one median. Here we assume that the size of B is even. 

Define a primer Markov chain, P, to transition between the medians. As mentioned, it suffices to define 
our Markov Chain on the state space of all possible values that a median could take on the ambiguous bits. 
From any median, make a transition with the following probabilities: 

• With probability 1/2, remain in the current state. 

• With probability 1/2, randomly and uniformly select an ambiguous bit and change its value. 
Because we remain at the current state with probability by definition this is a lazy Markov chain. 

Observation 5.3. The primer Markov chain P is irreducible and aperiodic. 


For a fixed multiset of strings B = {vi}'^i and median p, £ M{B), define 


/(m) := 

i=l 


Now we employ the Metropolis-Hastings algorithm (Metropolis et al. Il953^ to obtain a secondary Markov 
chain C with a desired limit distribution as follows. The states remain the same, but the transition proba¬ 
bilities are changed in the following way. From state p, we propose a next state p' which differs from p in 
at most one bit. If p' is different from p, accept this transition with probability 


min 



In other words, if p' was reached from p with probability P{p'\p)^ then in the secondary Markov chain C 
the transition from p to p' will be made with probability 

C{p\p) = P{p\p) ■ minjl, ■ 

For a given collection of strings, the function / defines a probability distribution 6 on the medians where 
9{p) is directly proportional to f{p). In other words. 


e{p) oc f{p) 


( 11 ) 


or 9[p) = kf{p) for some constant k and any median p. 


Observation 5.4. Markov chain C is reversible and converges to the limit distribution 9. 


Therefore, we have a Markov chain on the state space of medians which, in the limit, will sample each 
median p with distribution proportional to OlLi H{p,Vi). Once we have a median, it is easy to uniformly 
sample from the scenarios that it admits. 

Now we will show that the Markov chain C is torpidly mixing (not rapidly mixing). To prove this result, 
we will need the following definitions. 

For any nonempty subset S of the set of medians M.{B), the capacity of S is 


9{S) ■.= Y.9{p) 


and the ergodic flow out of S is 

FiS) := ^ 9ip)C{pW). 

UdS 

veM(B)\S 


The conductance is 


<I> := min 
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Theorem 5.5 fBremaud l2008h . A Markov chain is rapidly mixing if and only */ $ > for some polyno¬ 
mial p{n) which is not identically zero. 

Consider the following instance of Z{B,x\). Define v and V to be the strings in {0,1}" where v is the 
string of all Os and V is the string of all Is. Let B — be the multiset containing t copies of iz and t 

copies of F. The set of medians A4{B) is equal to {0,1}". Further, if p has exactly k ones, then 

2t 

Y[H{p,iy,y. = {k\{n - ky.f . 


Consequently 


z{B,x\)= 

/jeX(B)i=l fc=o 


{k\{n-kyf := T. 


Therefore 9{p) = ^ {k\{n — k)])* ■ 

Suppose n is odd. Consider the subset S which contains all medians with at most ones. For this 
subset, the capacity is 9{S) = 

Let S' be the set of medians p in S with exactly ones. Let S be the set of medians in A4 \ 5 with 

exactly ones. Then |5"| = (|^2j) = ([^l) = l*^!- For oaoh p G S \ S' and iz £ A4 \ S, C{p\i') — 0. 

Further, for each p G S', there are only \^~\ medians in Ad \ S' such that C{p\i') 0 and for these medians 

V & S and C{p\v) = ^ • F. 

For the ergodic flow out of S, we have 

F{S)= Y. 

iJ.eS 

veM\s 




AtGS' 

veM\S 


fJGS' 


( 

"'ll 

'71' 

,y 11 

'71' 

1 

L 2 J' 

2 

7 2n 

2 


/ 71 


"'ll 

'71' 

,y 11 

'71' 


T V 

L 2 J' 

2 

7 2n 

2 


1 

'71' 

n\ f 

"'ll 

'71' 

A 

2n 

2 

T [ 

L 2 J' 

2 

7 

1 n -1- 1 


71 



2" 2 n!ELo(t!("-*:)!)' 




, t-1 


1 

<-- 


1 


This implies 


<- 


F(S) 

$ <—Af < 


■2 (0!n!) 
1 1 


t-i 




t-1 


(lij) 


■7r(S) 


(lij) 


j=i £ 


n + 1 


t-1 


< 


^nl2 


t-1 


1 


2n(t-l)/2- 


Therefore, if t > 1, then as n grows, we see that $ cannot be lower-bounded by a function of the form 
where p is a polynomial in n. Therefore the Markov chain C is torpidly mixing by Theorem 15.51 

6. Complexity of computing Z{B,f{x)) 

In this section, we consider the generalized Z{B, f{x)). First, fix a continuous function / : R —>■ K.. Then 
define the following problem: 
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Definition 6.1. Given an arbitrary m G let B = be an arbitrary multiset of binary strings. 

Determine the value of 

AiSAI(S) ig[m] 

In the previous section, we showed that computing Z[B,x\) is ^P-complete. Here we work toward 
determining the computational complexity of Z{B,f{x)) for various functions fix). First, we formalize a 
definition and develop a couple of tools. 

Definition 6 . 2 . A function 5 : R —> ffi. is strictly concave up if for any x, y, z G R, x < y < z, 

z — X 


Lemma 6.3. If\ogf{x) is a strictly concave up function, then for any x <y and a > 0, 

f{x)fiy) 


fix-a)f{y + a) 


< 1 . 


Proof. By the intermediate value theorem, there are real values c, d with c G {x — a,x) and d G (y, y + a) 
such that 

(log/)'(c) = -(log/(a;) - log/(x - a)) and 
a 

(log/)'(d) = i(log/(y + a) -log/(y)). 

Because (log/)'(x) is strictly increasing, g'{c) < g'{d). Therefore, 

i(log/(x) - log/(x - a)) < i(log/(y + a) -log/(y)) 
log fix) - log fix -a) < log fiy + a) - log /(y) 

log + 


fix - a) 


fiy) 


fix) ^ fiy + a) 


fix - a) fiy) 

fix)fiy) 


fix-a) fiy + a) 


< 1 . 


□ 


Fact 6.4. Fix k G U {0}. Let fix) be a function such that log/(x) is strictly concave up. Then 


mm /(«)/(&) = / 

a,/3eZ+U{0} 

a-\-b=k 


f 


Proof. Let x = [|J and 2/ = [f]- By Lemma lOl fix — a)fiy + a) < fix)fiy) which gives the desired 
result. □ 


Theorem 6.5. Fix a function fix) : Z+ U {0} —> [0, oo) which satisfies the following properties: 

• log/(x) is strictly concave up, 

• the function values of f can be computed in polynomial time, and 

• for all but finitely many n G Z, n >2, 

/(n-2)[/(n + l)p[/(n + 2)]3/(n + 5) 

/(n-l)[/(n)]3[/(„ + 3)]3/(n + 4) 
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For arbitrary m, s G Z’*' and D G'K, let S := {ui, 1 / 2 , ■ ■ ■, Um} be a multiset of binary strings, each of length 
s. Then it is #P-complete to determine how many medians fi for S have 

Y[ f{H{,.,,yi))<D. ( 12 ) 

iCfm] 


Proof. Fix a function f{x) with the properties listed in the theorem. 

If is straightforward to see that computing Z{B,f(x)) is in ^P. Fix an instance consisting of integers 
m and s, real number D, and a multiset B of binary strings of length 1. Let /i be a binary string of the 
same length as each We can verify that ^ is a median in time 0{m£). Each F[[vi,p) can be computed 
in time 0{t}. Because P[{vi, g) < £, we can compute f{P[{i'i, /i)) in time polynomial in the size of the input 
by the conditions on /. Finally, checking if the product is at most D is also a polynomial time calculation. 
Therefore computing Z{B,f{x)) is in ffP. 

To prove #P-hardness, we will provide a reduction from :j(^:D3SAT. Fix F, a D3CNF with n variables and 
k clauses, set 

K =[/(n)]^'=[/(n + l)]22fc+2fc"[/(n + 2)]48fc+i2fe>^ 

• [fin + 3)]4«'=+42'="[/(n + 4 )] 22 '=+ 2 fe"[/(n + 5 )] 9 fc 
The idea is to define a multiset, 22, of binary strings with the following properties: 

• Each median /i which corresponds to a satisfying truth assignment for F will have 

n 

• Each other median g! will have 

n > K. 

Create a total of 158fc + 28kn strings of length 2n + 260k + 35kn with coordinates 

(^1: Vli y2i • • • 1 Uni ^1: ^2i ■ ■ • ; ^t) 

where t = 260k + 35kn. This multiset of binary strings will be defined as the union of three multisets: 

22 = A W y Bi l±l (+J . 

iG[n] i£[fc] 

As in Definition m we will define each string 77 S 22 by explicitly giving the values of r][xi] and r][yi] for 
each i G [n] and telling the number of additional ones. 

The collection A contains 108A: strings. For a G [t], let be the string with a[xi] = oi[yi] = 1 for all 

1 < i < n and a additional ones. Define be the binary string which is complementary to 0 *^+°^ on 

the first 2n coordinates and has a additional ones. The multiset A will consist of the following strings: 

• k copies each of 0 *^+°^ and 

• 8 k copies each of 

• 18k copies each of and 

• 18k copies each of and 

• 8 k copies each of a^+4) ^nd 

• k copies each of and 

The collection B = Bi contains 28kn strings. For each i G [n], a G [t], let be the string with 

(3i[xi] = (3i[yi] = 1, fo'' j 7 ^ b ^''4^ with a additional ones. Define the binary string /s/'*' ^ 

to be complementary to on the first 2n coordinates and have a additional ones. The collection Bi 

consists of the following 28k strings: 

• k copies each of and 

• 6k copies each of and 

• 6k copies each of and 

• k copies each of and 
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The collection C — C[ contains 50A: strings. Each set C', which is associated with clause Ci, consists 

of 50 strings. In Section [XTl we defined set Ci through Tabled] For each u* G C^, create by increasing the 
number of additional ones in i/® by one. Then 

CJ. 

Let Ad be the set of all medians for V. From Definition 13.31 and for each clause Ci in T, set 
M:=M(V), M':=M'(V), 

:= Ad^D), Mr := Mr{V). 

According to Remark [3l4l all medians ^ must have = 0 for all i G [t]. In A, B, and C, the strings 
come in pairs where one is complementary to the other on the first 2n coordinates. By Fact 13.91 each of the 
Xi and Ui coordinates are ambiguous. Therefore 

Ad = {0,1}2’^ X {0}‘, 

Ad' = {01,10}” X {0}‘. 

Define 

'Hip., A) := f{H{n,a)). 

aeA 

Similarly define HipjBi) and 7d(^,C'). Set 

— -Hill, A) ■ 

ig[n] je[fe] 

For each ^ G Ad, we obtain a lower bound for and for each p G Adp we describe an exact value for 

'H{p). Divide Ad into 4 classes using the following three properties which a median /i G Ad may have. 

Property 1. E*e[n]+ tAVi]) = 

Property 2. p G M'. 

Property 3. p G M'r- 

Notice that these properties are nested. Any median p G M with Property 2, must also have Property 1. 
Further, any p G M with Property 3 must also have Property 2. The following claims provide lower bounds 
for medians according to their properties. 

Claim 6.6. If p G M has Property 1, then 

np,A) =[f{n)n{n + l)mn + 2 )]^®'= 

• [f{n + 3)n{n + + 5)]^'= (13) 

— '^good' 

Otherwise, 

n{p, A) >[f{n - l)/(n + l)nf{n)f{n + 2)]«'=[/(n + l)/(n + 3)]i8'= 

• [/(n + 2)/(n + + 3)/(n + 5)nin + 4)/(n + 6)]^= (14) 

— ’0!-bad' 


Proof. If /X G Ad has Property 1, then H{p,a^^^^) = H{p,a^~^^A = because a^^^\xi\ = — 4 

all i G [n] while p only has n ones in the first 2n entries. Because p\ej\ = 0 for all i G [t], by Definition 13.41 

H{p, 0 ^“'"“^) = H{p,a^'^°'A =n + a. 

Recalling the exact strings that appear in A, we quickly obtain (nsj). 

If p does not have Property 1, then either p has more than n ones in the first 2n entries, implying 
H{p, > n, OT p has less than n ones in the first 2n entries, implying H{p,ct'^^A > By Fact 13.71 
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H{p,, = 2n. By Fact 16.41 and the above observations, 

> f{n - l)f{n + 1), 

f {h ■ / (^H > /(n - 1 + a)/(n + 1 + a). 

Recalling the exact strings in B, we obtain the lower bound in (fni) . 

Claim 6.7. For each /r G A4 and each i € [n], 

ni^i, S.) >[/(n + l)mn + 2 )]i 2 '=[/(„ + 3)]i2'=[/(n + 4)]^'= =: 

If fi has Property 2, then for every i G [n], 

'H{pL,Bi) =: Pgood- 

If fi satisfies Property 1, but not Property 2, then there exists i^ G [n] such that 

Hf{n-l)f[n + 2,)f[f{n)f{n + A)f^ 

■ [f{n + l)/(n + 5)mn + 2)/(n + 6)]'= 

— -i^bad- 


Proof. For any fi G A4, by Fact I3.7[ 

By Fact 16.41 for each a G Z+ U {0}, 

/(i7(^,/3(+o))) > [fin)]\ and 

/(i7(M,y+“))) > [f{n + a)]\ 

Therefore, for any fi G M, 

Ii{yi,Bi) > Pgood- 

If ^ G M', then for each i G [n], p.[xi] ^ P-[yi]- On the other hand, for each i G [n], j G 
[Vj]- Therefore for any i, j G [n], 

The same holds if is replaced with Therefore, 

i7(Ai,/3l+‘’)) =n, 

= H (^p,, =n + a. 

As a result 'H{p) = flgood- 

If p satisfies Property 1 but not Property 2, then we can define a tighter lower bound 
particular, because p ^ M' , there exists jq G [n] such that p[xi^f\ = p[yio]. Recall I3\y\xi^f\ 
and 'pf''\xio\ = 'P^P\y^o\ = 0. Therefore, 

p[xi^\ = 1 ^H{{p[xig\,p[yi„]), {l3iy^[x^o],^3iy\yio])) = 0 , 

H{{p[xi^],p[yi^]),(pf‘'\x^„],p[y\yi^])) =2, and 

y[xio] = 0 ^H{{p[xi^],p[yio]), = 0 ) 

H{{p[x.,],p[y,,]), (4+“^K],4+“)[2/.J)) = 2. 

Because p satisfies Property I, there are exactly n ones among the first 2n coordinates, 
generality, p[xig\ = p[yi„] = 1. Set 


□ 

(15) 


(16) 


[n], PG'^Xg] = 


on Hip, Bi). In 

= /3ir^b.o] = i 


Without loss of 


S ■■= {xj,yj : j G [n],j ^ io}. 
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Then /i has n — 2 ones and n zeros among the coordinates in S. However, takes the value 0 on each 

of the coordinates of S and takes the value 1 on the coordinates of S. Therefore, 

= 1 = 0 + (n - 2), 

= 2 + n, and 
fJ'lxio] =0 = 0 + (n-2), 

Hi^,,pl+°^) = 2 + n. 


As a result. 


= (n - 2)(n + 2), 


□ 


H{^^, = (n - 2 + a)(n + 2 + a). 

Taking into account all binary strings in Big, we conclude 'H{^,Big) = Pbad in (1161) . 

Fact 6.8. For the quantities defined in Claim |g.7[ Pgood < Pbad- Consequently, if fi G At \ A4' and 
satisfies Property 1, then Oiefn] ^i) ^ PbadfigP^j,- If fJ. & M \ At' and does not satisfy Property 1, then 

Proof. Observe 

fin - l)/(n)®/(n + l)^/(n + 4)'‘/(n + 5)®/(n + 6) 


Pgood 


/(n + 2)ii/(n + 3)^ 


/(n- l)/(n + 6)' 

k 

fin) fin+ 5) 

6k 

'fin + l)fin + i)' 

fin + 2)/(n + 3)_ 


fin + 2)f{n + 3)_ 


/(n + 2)/(n + 3)_ 


n 4fe 


> 1 

where the last inequality follows from Lemma 16.31 
Claim 6.9. For any /r G At and for each j G [k], 

n{fi,C') > [f{n + 2)mn + 3)f^ =■ Irmn- 
If fd € Atp, then for each j G [k], 

=[fin)f[fin + l)]®[/(n + 2)]^^[/(n + 3)]^^[f{n + 4)]®[/(n + 5)]^ =: -/good- 
If Id £ At' \ Atp, then there exists ig G [fc] such that 


□ 


niid,Clg) =f{n - l)[/(n)]«[/(n + l)f[f{n + 2)] 
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■ [fin + 3)] ’'[/(n + 4)]'^[/(n + 5)]''/(n + 6) 
— • 'ybad- 


(17) 


(18) 


Proof. Let /r be an arbitrary median in At. By Remark 13.121 the binary strings in Ci come in pairs that are 
complementary on the first 2n entries. With a careful examination of TablelU if rj, iq' G Ci are complementary 
on the first 2n coordinates, then e{rf) + e(rj') = 3 where e is the function specifying the number of additional 
ones. By the definition of C', the strings still come in complementary pairs, (f), rj'), but here eijl) + e{fi') = 5 
because the number of additional ones in r) and fj' is precisely one more than the number in ry and rj'. By 
Fact 13.71 for each of the 25 pairs in C', 

H(p, ff) + Hiid, r)') = 2n + 5. 


Then by Fact 16.41 


fillic,fl))fillip,v')) > /(n + 2)/(n + 3) 


which gives the general bound 7mm 
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Now suppose € M'y- This implies for all clauses Ci in T. By the definition of C', for each 

i>] G C', i>j) = u]) + 1 where u] G Ci. From we see 

C'j) : j G [50]} = {n( 7 ), (n + 1)(6), {n + 2)(i2), {n + 3)(i2), (n + 4)(6), (n + 5)(7)}. 

This immediately implies H(/r,C') = 'ygood in (1171) . 

Finally, suppose G A4' \ A4p. Using the bijection in Definition 13.11 ^ must correspond to a truth 
assignment which does not satisfy F. So there is a clause Ci(, in F which is not satisfied. Therefore /r G 
M' \ ■ From H]), adding 1 to each H{fi, to obtain H{^, we obtain 

: j G [50]} = {(n- l)(i),n(6), (n+ 1)(3), (n + 2)(i5), 

(n + 3 )(i5 ), (n + 4)(3), (n + 5)(6), (n + 6)(i)}. (19) 

This directly implies = ^bad in (HH). □ 


Fact 6.10. For the quantities defined in Claim HTSl 'jgood < Ibad- As a result, when qi G A4' \ A4}, 

'H{P,C) > Ibadlgood- 


Proof. Indeed, this was our initial assumption: 

Ibad ^ fin - l)[/(n + 2)]3[/(n + 3)]^/(n + 6 ) 

Igood /(n)[/(n + l)]3[/(n + 4)]3[/(n + 5)] 

The bound for 'H(/r,C) results from the fact that fi G M' either corresponds to a satisfying truth assignment 
for Ci or a non-satisfying truth assignment for each clause c^. □ 

In summary. Claims 16.6116.71 and 16.91 along with Facts 16.81 and 16.101 we give the following bounds. If 
/r G M'y, 


If /r G 


Idip.^ Oi.goodl^good'^good • ^ 3 - 


n{^l) > CtgoodPgoodlbadlgood — • ^ 2 - 


If ^ G A4 \ M' and has Property 1, 


'Hiti) > ^goodf^badf^good^min ' ^1- 
If ^ G A4 but does not have Property 1, 

— ^badfl good^min ■ ^ 0 - 

In order to complete, the proof, we only need to show /13 < hi for i G {0,1, 2}. By one of our assumptions 
about fix), we have already verified in Fact 16.10l that 
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Next observe 

hi Pbad 


h2 Pgood 'ybad7goo]l 
Pbad 7r) 


> 


■ 

/ min 


Pgood Ibad 


fin-1)fin+ 3) 

' fin) fin + A)' 

6 

7(n + l)/(n + 5)' 

® /(n + 2)/(n + 6) 

[/(n+ 1)]2 

. [/(« + 2)]2 _ 


[ [fin + W \ 

[/(n + 4)]2 


[/(n + 2)]i0[/(n + 3)] 


10 


n fc 


fin - ^)[fin)]^[fin + ^Wlfin + 4)]3[/(n + 5)]6/(n + 6) 
/(n + l)/(n + 4) 

/(n + 2)/(n + 3) 


1 k 


>1 


where the last inequality follows from Lemma 16.31 
Finally we prove that /iq > ^ 2 - 

ho CXbad Imin 


h2 agood 7bad7good 


> 


^bad Tr? 


Oigood %ad 


fin - l)/(n+ 1) 

7(n)/(n + 2)' 

8 

7(n + l)/(n + 3)' 

[fin)? 

[ 77 +IF J 


[ 77 + 2)F J 


18 


/(n + 2)/(n + 4)' 

18 

fin + 3)/(n + 5) 

® /7 + 4)/7 + 6) 

77 + 3)7 J 


[ 77 + 4)7 J 

77 + 5)]^ 


[f(n + 2)r[f{n^3)] 


10 


k 


Jin - l)[/(n)] 6 [/(n + l)]3[/(n + 4)]3[/(n + 5)]^f{n + 6 ) 

[f{n - iJlfinmn + 1)^71 + 

[/(n)] 2 fc[/(n + l)]i 6 fc[/(n + 2)]36fc 

[fin + 3)]^^^[/(n + A)mn + 5)]«'=[/(n + 6 )]'= 

[f{n + 3)]36'=[/(n + 4)]i®^[/(n + 5)]^^ 

_ [/(n + 2 )]i°"[/(n + 3)]3°fe _ 

[fin - J]'"[finW^[fin + l)]3fc[/(n + 4)]3'=[/(n + 5)]6fc/(n + 6 )'= 

= 1 . 

Therefore for any jj, £ and fi £ A4 \ A4L then 'Hifi) < T-LiJ)- Thus, if we could determine, in 
polynomial time, how many medians ^ £ A4 have h{p,) < / 13 , then we could determine how many satisfying 
truth assignments exist for T in polynomial time. □ 


Corollary 6.11. Fix a function fix) : 'iJ U {0} —y [0, 00 ) which satisfies the following properties: 

• \ogf{x) is strictly concave up, 

• the function values of f can be computed in polynomial time, and 

• for all but finitely many n £ Z, n >2, 

/(n-2)[/(n + l)]3[/(n + 2)]3/(n + 5) 

/(n-l)[/(n)]3[/(„ + 3)]3/(n + 4) 
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For arbitrary m, s G Z'*' and D G'K, let S := {ui, 1 / 2 , ■ ■ ■, Um} be a multiset of binary strings, each of length 
s and let M. be the set of medians for S. Then it is NP-complete to determine if 

min TT f{H{v„fi)) < D. (20) 

iG[m] 


This next theorem gives the same result as Theorem 16.51 with one change in the conditions on /. While 
Theorem 16.51 required that 


/(n-2)[/(n+l)]3[/(n + 2)]3/(n + 5) 
- l)[/(n)]3[/(n + 3)]3/(n + 4) 


Theorem 16.121 switches the inequality to consider functions in which the ratio is less than 1. 


Theorem 6.12. Fix a function f{x) : Z+ U {0} — > [0, 00 ) which satisfies the following properties: 

• log/(x) is strictly concave up, 

• the function values of f can be computed in polynomial time, and 

• for all but finitely many n G Z, n > 2, 

/(n-2)[/(n + l)p[/(n + 2)]3/(n + 5) 
f{n-l)[f{n)]^[f{n + 3)]^f{n + 4) 


For arbitrary m, s G N and Z? G R, let S := {vi, 1 / 2 , ■ ■ ■, Vm} be a multiset of binary strings, each of length s. 
Then it is #P-complete to determine how many medians p, for S have 


n < D. 

i^[m] 


Proof. This proof closely mirrors the proof of Theorem 16.51 Here we will note the changes that need to be 
made. 

This time, we define 98fc + 24A:n binary strings, each of length 2n + 245fc + 60A:n with coordinates 


{xi, yi ,..., Xfi, y^i, Cl,..., Ct)• 


Let and be defined as before. The collection A will now consist of the following 72k strings: 

• 4fc copies each of 0 ^+^) and 

• 14fc copies each of and 

• 14fc copies each of and 

• 4fc copies each of and 

Define and as before. The collection Bi now consists of the following 24fc strings: 

• 6k copies each of and , 

• 6k copies each of and . 


Following the explanation found in Section 13.11 Table H] defines 26 binary strings Ci for a clause. As in 
the proof of Theorem l6.51 we will add 1 additional one to each of the 26 strings in Ci to create C'. 

Using the same Properties 1, 2, and 3 as before, we obtain the following values which are analogous to 
the bounds in Claims and 16.91 


agood :=[/(n + + 2)mn + 3)mn + 4)]8^ 

oibad :=[/(n)/(n + 2)]‘‘'=[/(n + l)/(n + 3)]^^'= 

• [f{n + 2)/(n + A)Y^\f{n + 3)/(n + 
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Ugood :=[/(n + 2)]l2'=[/(n + 3)]l2^ 

Prmn ■.= [f{n + 2 )Y^^[f{n + i)Y^\ 

Pbad :=[/(n)/(n + 4)]®'=[/(n + l)/(n + 5)]®'=, 

Igood ■=fin - l){f{n)f[f{n + l)]^[/(n + 2)]® 

• [f{n + 3)]®[/(n + 4)]^[/(n + 5)ff{n + 6), 

Ibad :=[fin)f[f{n + + 2)f[f{n + 3)]3[/(n + 4)f[f{n + 5)]^ 

Irmn :=[/(?^ + 2)]^^[/(n + 3)]^^. 


By our assumption about /(x), 


7 bad ^ /(n)[/(n + l)]^[/(n + i)ff{n + 5) 

Igood f{n - l)[/(n + 2)]3[/(n + 3)]3/(n + 6 ) 


which implies jtad > Igood- 

Next we determine the values of hp, hi,/ 12,/13 in this setting. As before, if fi has Property i, but not 
property i + 1, then > hi. Further, if /i has Property 3, 'H{^) = ha. If ^ does not have Property 1, 
then Hifi) > hp. 


^3 • ^goodl^good'ygood- 

^2 •= CHgoodPgood'ybad'lgood' 

hi := OtgoodPbadPrnin'^min- 
hp . ^badl^rnin^min¬ 


ks in the proof of Theorem 16.51 we will show hp < hp, hi, h 2 . 
By our assumption about /(x), 


h 2 _ 7bad 
hp ^good 


Also 


hs _ Pbad ^ Imin 

h 2 / 3 good Ibadllood 

^ Pbad ^ '~1min 
Pgood 7had 


\f{n)f{n + Ay 

6 

'f{'n+ l)/(n + 5)' 

6 " 

_[ [/(n + 2)P J 


[ [fin + W \ 



[/(n + 2)]i°[/(n + 3)]i° 

■ _[/(n)]4[/(n+l)]6[/(n + 4)]6[/(n + 5)]4_ 

^ /(n)/(n + 5) 1^^ 

_/(n + 2)/(n + 3) 

>1 


by Lemma 16.31 
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Finally we show Hq > h 2 . 

^0 ^bad ‘^min 

hi agood Ibadl’^ood 

^ C^bad ^min 
Oigood 'y^g.d 


\f{n)f{n + 2y 

4 

' f{n + l)f{n + 3y 

[[ [fin+lW \ 


[ [fin + 2W \ 


/(n + 2)/(n + 4)' 

14 

7(^^ + 3)/(r^ + 5)■ 

4" 

[fin + W \ 


. [fin + W _ 



[f{n + 2)r[f{n + 3)r 1 

■ _[/(n)]4[/(n+l)]6[/(n + 4)]6[/(n + 5)]4_ 

_ [/(^)]'^^[/(^ + + 3)]^^^[/(n + 4)]^'^^[/(n + 5)]^^ 

[/(’^ + + 2)]^®''[/{n + 3)]2®^[/(n + 4)]®'= 

[/(n + 2)]io'=[/(n + 3)]io'= 

■ [/(n)]4fc[/(n + l)]6n/(ri + ^)mn + 5)]^>^ 

= 1 . 

Making each of these changes in the proof of Theorem 16.51 we complete the proof of Theorem 16.121 □ 

Table 3. The 26 strings to complement the collection in Table [T] along with 
their Hamming distance from medians in M'. 



A 

B 

c 

Ml 

M2 

M3 

M4 

M5 

M6 

M7 

M8 

Row 

# 

Values of 

Uj on its 
support set 

v}[xe\, 

{vg ^ Ci) 

Add’l 

Ones 

101010 

1010 01 

10 0110 

011010 

10 0101 

0110 01 

010110 

010101 

1 

010000 

0 

+0 

n + 1 

n + 1 

n + 1 

n — 1 

n + 1 

n — 1 

n — 1 

n — 1 

2 

00 0100 

0 

+0 

n + 1 

n + 1 

n — 1 

n + 1 

n — 1 

n + 1 

n — 1 

n — 1 

3 

00 0001 

0 

+0 

n + 1 

n — 1 

n + 1 

n + 1 

n — 1 

n — 1 

n + 1 

n — 1 

4 

101111 

1 

+3 

n + 2 

n + 2 

n + 2 

n + 4 

n + 2 

n + 4 

n + 4 

n + 4 

5 

111011 

1 

+3 

n + 2 

n + 2 

n + 4 

n + 2 

n + 4 

n + 2 

n + 4 

n + 4 

6 

111110 

1 

+3 

n + 2 

n + 4 

n + 2 

n + 2 

n + 4 

n + 4 

n + 2 

n + 4 

7 

010100 

0 

+2 

n + 4 

n + 4 

n + 2 

n + 2 

n + 2 

n + 2 

n 

n 

8 

010001 

0 

+2 

n + 4 

n + 2 

n + 4 

n + 2 

n + 2 

n 

n + 2 

n 

9 

00 0101 

0 

+2 

n + 4 

n + 2 

n + 2 

n + 4 

n 

n + 2 

n + 2 

n 

10 

101011 

1 

+1 

n — 1 

n — 1 

n + 1 

n + 1 

n + 1 

n + 1 

n + 3 

n + 3 

11 

101110 

1 

+1 

n — 1 

n + 1 

n — 1 

n + 1 

n + 1 

n + 3 

n + 1 

n + 3 

12 

111010 

1 

+1 

n — 1 

n + 1 

n + 1 

n — 1 

n + 3 

n + 1 

n + 1 

n + 3 

13 

100101 

0 

+1 

n + 2 

n 

n 

n + 4 

n-2 

n + 2 

n + 2 

n 

14 

011001 

0 

+1 

n + 2 

n 

n + 4 

n 

n + 2 

n — 2 

n + 2 

n 

15 

010110 

0 

+1 

n + 2 

n + 4 

n 

n 

n + 2 

n + 2 

n — 2 

n 

16 

101001 

0 

+1 

n 

n — 2 

n + 2 

n + 2 

n 

n 

n + 4 

n + 2 

17 

10 0110 

0 

+1 

n 

n + 2 

n — 2 

n + 2 

n 

n + 4 

n 

n + 2 

18 

011010 

0 

+1 

n 

n + 2 

n + 2 

n — 2 

n + 4 

n 

n 

n + 2 

19 

101010 

0 

+1 

n — 2 

n 

n 

n 

n + 2 

n + 2 

n + 2 

n + 4 

20 

010101 

1 

+2 

n + 5 

n + 3 

n + 3 

n + 3 

n + 1 

n + 1 

n + 1 

n — 1 

21 

10 0101 

1 

+2 

n + 3 

n + 1 

n + 1 

n + 5 

n — 1 

n + 3 

n + 3 

n + 1 

22 

011001 

1 

+2 

n + 3 

n + 1 

n + 5 

n + 1 

n + 3 

n — 1 

n + 3 

n + 1 

23 

010110 

1 

+2 

n + 3 

n + 5 

n + 1 

n + 1 

n + 3 

n + 3 

n — 1 

n + 1 

24 

101001 

1 

+2 

n + 1 

n — 1 

n + 3 

n + 3 

n + 1 

n + 1 

n + 5 

n + 3 

25 

10 0110 

1 

+2 

n + 1 

n + 3 

n — 1 

n + 3 

n + 1 

n + 5 

n + 1 

n + 3 

26 

011010 

1 

+2 

n + 1 

n + 3 

n + 3 

n — 1 

n + 5 

n + 1 

n + 1 

n + 3 


The information in this table is to be read in the same way as the information in Table [T] 
This is detailed in Section o 
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Corollary 6.13. Fix a function f{x) : U {0} —>■ [0, oo) which satisfies the following properties: 

• log/(a;) is strictly concave up, 

• the function values of f can be computed in polynomial time, and 

• for all but finitely many n G h, n > 2, 


/(n-2)[/(n + l)p[/(n + 2)]3/(n + 5) 

/(n-l)[/(n)]3[/(n + 3)]3/(n + 4) 

For arbitrary m, s G Z'*' and D G'K, let S := {vi, 1 / 2 , ■ ■ ■, r'm} be a multiset of binary strings, each of length 
s and let A4 be the set of medians for S. Then it is NP-complete to determine if 


min f <D. 

i^[m] 


( 21 ) 


We can also state the following corollaries for functions which are strictly concave down. 


Corollary 6.14. Fix a function f{x) : Z+ U {0} —>■ [0, 00) which satisfies the following properties: 

• log/(x) is strictly concave down, 

• the function values can be computed in polynomial time, and 

• for all but finitely many n G h, n > 2, 


/(n-2)[/(n + l)p[/(n + 2)]3/(n + 5) 

/(n-l)[/(n)]3[/(n + 3)]3/(n + 4) ^ ' 

For arbitrary m, s G Z’*' and D gR, let S := {ui, 1 / 2 ,, I'm} be a multiset of binary strings, each of length 
s. Then it is #P-complete to determine if how many medians fi for S have 


l[ f{Hiu,,pi))>D. ( 22 ) 

iG[m] 


Proof. If the function f{x) has the property that log/(a;) is strictly concave down, then log is strictly 
concave up. Therefore by Theorems 16.51 and 16.121 for the function it is #P-hard to determine the 
number of medians p which satisfy Iliefm] f{H(vi p)) — 'h' This is equivalent to asking for the number of 
medians p have OieH h)) > D- D 


Corollary 6.15. Fix a function f{x) : Z+ U {0} —>■ [0, 00) which satisfies the following properties: 

• log/(x) is strictly concave down, 

• the function values of f can be computed in polynomial time, and 

• for all but finitely many n G h, n > 2, 


/(n-2)[/(n + l)p[/(n + 2)]3/(n + 5) 

/(n-l)[/(n)]3[/(„ + 3)]3/(n + 4) ^ ' 

For arbitrary m, s G Z’*' and D gR, let S := {vi, 1 / 2 , ■ ■ ■, r'm} be a multiset of binary strings, each of length 
s where A4 is the set of medians for S. Then it is NP-complete to determine if 


min f{H{vi,p)) > D. 


(23) 


7. Stochastic Approximations for Z{B,f{x)) 

We have seen several proofs showing that it is hard to calculate many of these quantities. As in SectionjSj 
we may further ask if any of these quantities can be approximated. We will again focus on approximations 
via an FPRAS (Definition 15.21) . 

Before stating our results, we define a couple more complexity classes for decision problems: 

Definition 7.1 fGill ll97^ . A decision problem. A, is in the class RP (randomized polynomial time) if there 
is a probabilistic Turing machine that runs in polynomial time in the size of the input, returns “true” with 
probability at least i when the answer for A is true, and returns “false” with probability 1 when the answer 
for A is false. 
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Definition 7.2 fGill A decision problem, A, is in the class BPP (hounded-error probabilistic poly¬ 

nomial time) if there is a probabilistic Turing machine that runs in polynomial time in the size of the input, 
returns “true” with probability at least | when the answer for A is true, and returns “false” with probability 
I when the answer for A is false. 

One result connecting these classes is the following: 

Theorem 7.3 fPapadimitriou [l994ll . If the intersection of NP and BPP is non-empty, then RP=NP. 

Note that each result below holds for functions f(x) with log/(x) strictly concave down. The analogous 
results for the functions whose logarithm is concave up are still open. Our first result can be interpreted 
as sampling medians for ^SPSCJ with a probability distribution analogous to the number of scenarios, but 
dependent on f{x). 

Theorem 7.4. Fix a function f{x) : Z+ U {0} —> [0, oo) which satisfies the following properties: 

• log/(a:) is strictly concave down, 

• the function values of f can be computed in polynomial time, and 

• there exists e > 0 such that for all but finitely many n S Z, n > 2, 

/(n-2)[/(n + l)]3[/(n + 2)]3/(n + 5) 

/(n-l)[/(n)]3[/(„ + 3)]3/(n + 4) ^ "■ 

For arbitrary m,s € Z+, let S := {ui, 1 / 2 ,..., be a multiset of binary strings, each of length s. If 
there is a rapidly mixing Markov chain with stationary distribution proportional to Oie[m] f t))> then 

RP=NP. 

Proof. Fix a function / as described in the theorem. Because \ogf{x) is strictly concave down, \og{f{x))~^ 
is strictly concave up. Set g{x) := {f{x))~^. 

Now recall the proof of Theorem [6^ for strictly concave up functions. Take a D3CNF F with n variables 
and create a multiset of binary strings, T>. The set of medians for T> is Ai = {0,1}^" x {0}*. There is a one- 
to-one correspondence between the medians in the subset AT = {01,10}" x (Oj* and the truth assignments 
for F. Those medians which correspond to satisfying truth assignments for F form the set Afp. The multiset 
V is constructed so that each p & Ai'^ has 

n f(II(n u)) ~ n 9{TI{r], p)) — O-goodPgoodlgood “■ ^3 

and all other medians have 

n = n 9{H{v,p)) > OLgoodfi'^oodlbadllood =' ^2- 

Equivalently, if p G Ai'^, then 

n = Y' 

ijex) 3 

Otherwise, 

n < Y' 

Further, 

^2 _ "fbad 

hs ^good 

^ g{n - l)[g(R -f 2)]3[g(n -b 3)]3g(n -H 6) 

g{n)[g{n -\- l)]3[g(n -f 4)]3[g(n -h 5)] 

/(n)[/(n + l)]3[/(n + 4)]3[/(n + 5)] 

- l)[/(n + 2)]3[/(n -h 3)]3/(n -b 6) 

1 


> 


1 -e 
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where the last inequality is a result of the assumption in the theorem statement. As a result 



Now select an integer r, dependent only on the values of n and e, such that > 2 ^"+^. Create a 

new multiset I?(r) of binary strings such that 

V{r) . ..[tiV . 

r times 


The set of medians for 'D{r) is the same as the set of medians for V. However this time, for a median 
/X G Afp, 




Otherwise, ii ^ € M\ 




By the choice of r, 


22n ^ 


ri&T}{r) 


■)2n+2 


< 


1 / 1 


ft.2 V 1 ~ e 


< 


Since M = {0,1}^" x {0}*, |A1| = 2^” and the above inequality shows that for each ^0 G Alp, 


Further, 

ijeX) /iGA4r;eX> 

Now suppose that we had a rapidly mixing Markov chain on the medians for this instance as stated in 
the theorem. From the calculations above, it must sample medians which correspond to satisfying truth 
assignments for F with probability at least This is precisely an RP for D3SAT. However, this immediately 
implies RP=NP because D3SAT is NP-complete. □ 


The following theorem gives the same result as the last one for different functions /. In particular, it 
switches the inequality that / is required to satisfy. 


Theorem 7.5. Fix a function f{x) : IF' U {0} —> [0, 00 ) which satisfies the following properties: 

• log/(x) is strictly concave down, 

• the function values of f can be computed in polynomial time, and 

• there exists e > 0 such that for all but finitely many n G Z, n > 2, 

fjn - 2)[/(n + l)]^[/(n + 2)]^f{n + 5) 

/(n-l)[/(n)]3[/(„ + 3)P/(n + 4) ^ 

For arbitrary m,s € Z+, let S := {i'i,V 2 , ..., Vm} be a multiset of binary strings, each of length s. If there 
is a rapidly mixing Markov chain with distribution proportional to riiefm] / p)), then RP=NP. 

Proof. The proof for this theorem follows the same line of reasoning as the proof for Theorem 17.41 However, 
it makes use of details in Theorem 16.121 rather than Theorem 16.51 □ 


When f is a function with log fix) is concave down, we examine the possibility of an FPRAS (Deflni- 
tionES]) for Z(B,/(x)). 

Theorem 7.6. Fix a function fix) : Z+ U {0} —)• [0, cxd) for which: 

• log/(x) is strictly concave down, 

• the function values of f can be computed in polynomial time, and 
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• there exists e > 0 such that for all but finitely many n G Z, n > 2, 

/(n-2)[/(n + l)]3[/(n + 2)]3/(n + 5) 

/(n-l)[/(n)]3[/(n + 3)P/(n + 4) 

For arbitrary m,s€ Z+, let S := {vi,V 2 ,... ,Vm} be a multiset of binary strings, each of length s. If there 
is an FPRAS for calculating 

z{Bjix ))=^ n 

i^[m] 

then RP=NP. 


Proof. Let r be an integer so that > 2 ^"+^. In the proof of Theorem 17.41 we created a new multiset 

of strings P{r). The set of medians A4 for T>{r) is precisely {0,1}^" x {0}* and each median fx G M'y which 
corresponds to a satisfying truth assignment for T has 

r;e-D(r) ^ ^ 

All other medians have 

n • 

Therefore, if T has no satisfying assignments, 

UeMriGVir) ^ ' 


If there is a satisfying assignment for T, then 



By the choice of r, we have the following inequality to relate the two quantities: 


7r) 



Now suppose that there is an FPRAS for T := n 77 ei 5 (r)^))- other words, for any 

e,5 > 0, there is a randomized algorithm as described in Definition IS. 21 which outputs a quantity T such that 


P 



<T<T{l + e) 


>1-S. 


Consider the case when S = and e = I. Therefore, 

p (-T <f< 2r I > 

^^2 - - ; - 3 

Therefore, if T can be satisfied, then T > ^ 7 ^) and the probability that T is at least > 

22 n+i is On the other hand, if T cannot be satisfied, then T < 2^” (l^) probability 

that T is at most 2T = 2^"+^ (^) T Therefore, we have a BPP algorithm (Definition 17.2p for D3SAT. 
Because D3SAT is NP-complete, Papadimitriou’s Theorem 17.31 implies RP=NP. □ 


A similar result holds for functions f{x) which satisfy the opposite inequality. We do not give a proof as 
it follows the same reasoning in the proof of Theorem 17.61 

Theorem 7.7. Fix a function f{x) : Z+ U {0} —> [0, 00 ) for which: 

• \ogf{x) is strictly concave down, 

• the function values of f can be computed in polynomial time, and 
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• there exists e > 0 such that for all but finitely many n G Z, n > 2, 
f(n - 2)[/(n + l)f[f{n + 2)]^f{n + 5) 

/(n-l)[/(n)]3[/(n + 3)]3/(n + 4) ^ ’ 

For arbitrary m,s€ Z+, let S := {vi,V 2 ,... ,Vm} be a multiset of binary strings, each of length s. If there 
is an FPRAS for calculating 

z{B,f{x))=Y. n /(^(^dm)), 

i^[m] 

then RP=NP. 


8. SeT-UP for RESULTS ON BINARY TREES 

Previously, we explored the value Z{B,x\) as it related to the number of ways to label a star phylogenetic 
tree. In this section, we divert our exploration to binary phylogenetic trees. First we give a precise definition 
of a binary tree. 

Definition 8.1. A tree is a binary tree if it is rooted and every non-leaf vertex has exactly two children. 

Fix a multiset B oi m binary strings from {0,1}". Also fix a binary tree T with m leaves. Label 
the leaves of T with the strings from B via the surjective function ip : L{T) B. The equivalent of 
a median in this setting is a labeling ip' : V{T) {0,1}" which agrees with p on L{T) and minimizes 
J2uv&e{t) ^Such a vertex labeling ip' is called a most parsimonious labeling. Let Ai{T, p) be 
the set of most parsimonious labelings which extend p for the binary tree T. 

As with the star trees, we will label each edge of T with a scenario. Given a most parsimonious labeling 
p' for V{T), a scenario for the edge uv is a permutation of the bits in which p'{u) and p'(v) differ. 

A most parsimonious scenario for a binary tree T with leaf labeling p : L{T) ^ B consists of a most 
parsimonious labeling p' of the vertices of T and a scenario to label each edge of T. We desire to count the 
number of most parsimonious scenarios which is 

Zt,^(,B,x\) -.= Y n H{p'(u),p'(v))\. 

uv^E{T) 

Formally, the problem statement is below: 

Definition 8.2 (^Binary). Given arbitrary integer m > 2, let T be a binary tree with m leaves. Let 
B — {Pi}™]^ be an arbitrary multiset of binary strings and let p : L{T) B be a surjective function. 
Determine the value of ZT,ip{B,x\). 

The main result of Section |9] is the theorem which states ^Binary is in ^P-complete. In this section, we 
develop several tools and algorithms which lay the foundation for our main theorem. 

Let F = Cl Ac 2 A... Acfc be a D3CNF with variables {vi,V 2 ,... ,Vn}. Select new variables {wi,W 2 ,... ,Wn} 
which do not occur in F. For each i G [n], interpretting subscript n + 1 as 1, define the following D3CNF, 

:= {vi V Wj V Vi+i) A {vi V Wj V v~fr[) A(yj\/wjV Vi+i) A (W V wj" V vjfAi)- (24) 

Observe that <i>i is equivalent to the “exclusive or” {vi V Wi) A {viVwi). Define 

n 

4'(r) := F A/\ $,. (25) 

i=l 

Necessarily, if F is a D3CNF then so is 'L(r). 

Lemma 8.3. For F, an arbitrary D3CNF, it is #P-complete to determine the number of satisfying truth 
assignments for 4'(F). 

Proof. We have already shown in Lemma [2. 91 that #D3SAT is in ^P-complete. So to prove this result, we 
will show that the satisfying truth assignments for F and for 4'(r) are in one-to-one correspondence. 

Any truth assignment which satisfies ^'(F), when restricted to {ui,U 2 , ■ ■ ■ ^Vn} will necessarily satisfy F. 
For the other direction, recall that is equivalent to the “exclusive or” for Vi and Wi. Therefore, given a 
satisfying truth assignment for F, we can create a unique satisfying truth assignment for 'I'(F) by assigning 
to each Wi the opposite value of ui. □ 
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Next we provide two different algorithms for creating most parsimonious labelings given a rooted binary 
tree and a leaf-labeling kp : L[T) —>• {O,!}". If we restrict to a single coordinate c for every leaf 
we obtain a labeling '■ L{T) —>• {0,1}. Each algorithm presented below will consider leaf labels from the 
set {0,1} and output a most parsimonious labeling : V(T) —>■ {0,1}. Obtaining a most parsimonious 
labeling for each coordinate in this way, we combine these labelings to create a most parsimonious labeling 
p' : V{T) —>• {0,1}" for T and the original leaf-labeling p. 

Let T be a binary tree with root p and let p : L(T) —> {0,1} be a labeling for the leaves. Let p' : V(T) —)■ 
{0,1} be a most parsimonious labeling which extends p. Because each vertex is labeled with a single bit, 
H{p'(u), p'{v)) € {0,1} for any edge uv. By definition, the most parsimonious labeling p' minimizes the sum 
J2uv&e{t} H(}p'(u), p'(v)). Consequently, p' must minimize the number of edges uv such that p'{u) ^ p'(v). 

First, we have Fitch’s algorithm to find most parsimonious labelings. 


{ 0 , 1 }. 

.{ 0 , 1 } 


Fitch’s Algorithm fFitch [Toyill . Let T be a binary tree with root p and leaf-labeling p : L{T) - 
The following algorithm, completed in two parts, will find a most parsimonious labeling p' : V (T) 
which extends p. 

Part 1: Define a function B on the vertices ofT as follows: For each leaf i, set B{£) := {p{t}}. Extend this 
assignment to all vertices of T by the following rule: For a vertex u with children vi,V 2 such that 
B(vi) and B(v 2 ) have been defined, set 


B{u):= 


iB{vi) n B{v 2 ) if B{vi) n B{v 2 ) ^ ' 

U B{v 2 ) otherwise. 


(26) 


Part 2: Select a single element a G B{p). Define a function p' on the vertices ofT as follows: Set p'(p) := a. 
Extend p' to V{T) by the following rule: If v is a child of u and p'(u) is defined, then 


p'{v) := 


p'iu) 


ifp'iu) G B{v) 


l-p'{u) if p'(u) ^ B{v). 

The resulting p' is a most parsimonious labeling extending p and is called a Fitch solution. 


(27) 


While Fitch solutions are most parsimonious labelings, there are cases when Fitch’s algorithm finds some 
of the most parsimonious labelings but not all of them. Ho wever, Sankoff’s algorithm, described below, will 
produce all most parsimonious labelings (Erdos and Szekely Il994 ). 


Sankoff’s Algorithm (Erdos and Szekely Sankoff and Rousseau 197^. Let T be a binary tree with 
root p and leaf labeling p : L{T) —. {0,1}. This algorithm is completed in two steps. 

Part 1: Define functions sq and si on the vertices ofT as follows: First, for each leaf (., 


so(^) := 


0 if p{e) = o, 

oo otherwise. 


(28) 


si(£) := 


0 ifp{£) = l, 
oo otherwise. 


Extend these functions recursively to all vertices by the following: If vq and vi are children of u and 
Si{vj) has been defined for all i,j G {0,1}, then 


so{u) := min{so(i'o),si('yo) + 1} -b min{so('yi), si(pi) + 1}, 


(29) 


si(u) := min{so(i'o) + 1, si(i’o)} + min{so(i^i) + 1, si(iii)}- (30) 

Note: For any v G V{T), Si(v) counts the minimum number of edges, within the subtree containing 
V and its descendants, that will witness a change if a most parsimonious labeling assigned label i 
to vertex v. A leaf will have so{£) = oo (or si(£) = oo if it is impossible for a most parsimonious 
labeling to label £ with a 0 (1), because most parsimonious labelings must agree with the original leaf 
label. 
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Part 2: For each v G V{T), select ay G {0,1}. Define the function ip' on the vertices of T as follows: For 
root p, define 


{ 0 if so{p) < Slip), 

ap soip) = Slip), 
1 if soip) > Slip). 


Extend p>' to ViT) by the following rule: If v is a child of u and p'iu) is defined, then define p'iv) 
as follows: If (pfu) = 0, then 


If p'iu) = 1, then 


{ 0 ii/so(w) < si(ii) + 1, 
Uy if soiv) = siiv) + 1, 
1 */so(ii) > si(p) + 1. 


{ 1 ii/si(w) < so(p) + 1, 
ay if siiv) = Soiv) + 1, 
0 */si(i;) > so(i') + 1- 


(31) 


(32) 


The resulting p' is a most parsimonious labeling for T extending ip and is called a Sankoff solution. 


The following lemma draws a connection between the solutions found from each algorithm. 

Lemma 8.4. Let T be a binary tree with leaf-labeling p : L(T) —^ {0,1}. Suppose that, for each u,v G ViT) 
with V a child of u, the function B in Fitch’s algorithm satisfies 

i?(i;) = {0,l}^i?(w) = {0,l}. (33) 

Then for T and p, all Sankoff solutions are Fitch solutions. In other words, Fitch’s algorithm finds all most 
parsimonious labelings. 

In order to prove Lemma 15^ we first establish a series of claims (l8.5l through l8.8l) under the assumptions 
of Lemma 18.41 


Claim 8.5. For any non-leaf vertex v, if Biv) = {x} for some x G {0,1} in Fitch’s algorithm, then so)^) = 0 
and Siiv) = 2 in Sankoff’s algorithm. 

Proof. The proof proceeds by reverse induction on the distance from the root. For the base case, we 
consider those vertices whose children are both leaves. Let v be such a vertex with children vi and Vy. By 
symmetry of the argument, assume Biv) = {0}. Then Bivi) = Bivy) = {0} which only happens for leaves 
if pive) = pivy) = 0. By (1^ . soive) = Soivy) = 0 and si(vf) = si(ur) = oo. As desired, (1^ implies 
so(u) = 0 and (1501) implies si(u) = 2. 

For the inductive hypothesis, assume that each vertex v of distance at least d > 1 from the root has either 
so('y) = 0 and si(u) = 2 or soiv) = 2 and si(u) = 0. Let u be a vertex of distance d — 1 from the root. 
Again, we assume Biu) = {0} as the argument for the case when i?(u) = {1} is very similar. This vertex 
has two children, U(, and Uy. There are three cases to consider. 

(1) If Ui and Uy are leaves, then the argument in the base case gives soiu) = 0 and Si(u) = 2 as desired. 

(2) If Ui is a leaf and Uy is not a leaf, then soiuy) = 0 and siiuy) = oo and, by the inductive hypothesis 
so(fo) = 0 and siiue) = 2. Therefore (l29l) implies so(u) = 0 and (l30t implies si(u) = 2. 

(3) If Ui and Uy are not leaves, then by the inductive hypothesis, soiui) = soiuy) = 0 and si(itf) = 
siiuy) = 2. Again, (12^ implies soiu) = 0 and (1501) implies si(u) = 2. 

This complete the proof of the claim. □ 


Claim 8.6. For any vertex v with Biv) = {0,1} from Fitch’s algorithm, we will have soiv) = si(u) in 
Sankoff’s algorithm. 

Proof. This claim is also proven by induction on distance from the root where the base case examines those 
vertices with greatest distance from the root. 


COMPUTATIONAL COMPLEXITY OF CALCULATING PARTITION FUNCTIONS OF MEDIANS 


33 


For the base case, let i; be a vertex with B{v) = {0,1} and none of its descendants u have B{u) = {0,1}. 
For children ve and Vr of v we may assume B{vi) = {0} and B{vr) = {1} by (l26l) . By Claim 1831 

So(i'f) = si('Cr) = 0 and so{vr) = si(zi£) = 2. 

Therefore so(i') = si(i;) = 1. 

For the inductive hypothesis, suppose all vertices u with B{u) = {0,1} of distance at least d > 1 from the 
root have sq{ui) = si{ur). Let u be a vertex at distance d — 1 from the root with B(v) = {0,1}. There are 
three cases to consider: 

(1) If V has a child vi with B{v() = {0}, then by (05)) the other child Vr must have B{vr) = {1} and we 
can use the argument in the base case to see Sq(v() = Si{vr). 

(2) If V has a child vg with B(vt) = {!}, then by (051) . v must have another child Vr with B(vr) = {0}. 
This puts us back in case 1. 

(3) If V has a child vi with B{vi) = {0,1}, then by (l26l) . v must have another child Vr with B{vr) = {0,1}. 
By the inductive hypothesis, so(^^£) = si(i’£) and so(vr) = si(vr). By (l29ll and (l30l) . 30 ( 11 ) = si(n). 

This completes the proof of the claim. □ 

Claim 8.7. For any non-leaf vertex v with B{v) = {i} {i € {0,1}), both Fitch’s algorithm and Sankoff’s 
algorithm will define ip' {v) = i. 

Proof In Fitch’s algorithm, this is an immediate consequence of (ED). 

Now consider Sankoff’s algorithm. If B{v) = {0} then, by Claim 1531 30 ( 11 ) = 0 and Si(n) = 2. Observe 
3o(f) < si(n) + 1 and Si(n) > So(ii) + I. Therefore p'{v) = 0 by (OTl) and (1^ . □ 

Claim 8.8. Suppose B{p) = {0,1}. For any vertex v with B(v) = {0,1}, if both algorithms set p'{p) := 0, 
then both Fitch’s algorithm and Sankoff’s algorithm will set p'{v) = 0. Likewise, if p'{p) = 1, then both 
algorithms will set p>'{v) = 1. 

Proof. For any vertex v with B(v) = {0,1}, there is a path p = uq, ui,..., Ut-i, Ut = v of vertices such that 
Biui) = (0,1} for each i G [t]. It suffices to show that, in both algorithms, if p'{ui) = 0 for some f) <i <t, 
then ip'(ui+i) = 0. 

In Part 2 of Fitch’s algorithm, if (p'{ui) = 0 and B{ui+i) = {0,1}, then ()26)) implies (p'{ui+i) = 0. 

In Sankoff’s algorithm, if (p'(ui) = 0 and B{ui+i) = (0,1}, then by Claim 15^ so(ui+i) = 3i(Mi+i). Thus 
3o(wi+i) < si(ui+i) + 1. Since p'{ui) = 0, (|3T]) implies p'(ui+i) = 0. 

A similar argument can be used to show that if (p'{p) = 1, then p'{v) = 1. Therefore Fitch’s algorithm 
and Sankoff’s algorithm will agreed on p'{v) if they agree on p'{p). □ 

Proof of Lemma \8.4\ In each algorithm, once ip{p) has been set, the algorithm deterministically outputs a 
most parsimonious labeling of all vertices. Therefore, it suffices to prove that both algorithms have the same 
choices for labeling the root and both algorithms output the same most parsimonious labeling for the same 
choice for p'{p). 

If B{p) = {0} or B{p) = {I}, then there is only one choice in Fitch’s algorithm for p'{p). By Claim [831 
Sankoff’s algorithm has the same determined value for p>'{p). Further, all vertices v G V{T) will have either 
B[v) = {0} or B{v) = {1} by condition (l33l) and Claim ISTfl completes the proof. 

If B{p) = {0,1}, then in Fitch’s algorithm, there are two choices for p'(p). By Claim lOl 3o(p) = 3i(p) 
in Sankoff’s algorithm, which means there are also two choices for p'{p). Claim [8781 implies that if we make 
the same choice for the root, both algorithms give the same most parsimonious labeling p'. 

Sankoff’s algorithm is guaranteed to find all most parsimonious labelings and the most parsimonious 
labelings from Fitch’s algorithm coincide with those from Sankoff’s algorithm, this implies that Fitch’s 
algorithm finds all most parsimonious labelings. □ 

As mentioned earlier, these algorithms are designed for a tree T with leaf-labeling p : L{T) {0,1}. 
However, given a tree T with leaf-labeling (f : L{T) {0,1}", we can restrict all strings to a single 
coordinate and run one of the above algorithms to find a most parsimonious labelings for V (T) in that 
coordinate. Repeat this for each coordinate. The most parsimonious labelings found for each coordinate can 
then be combined into a most parsimonious labeling of V {T) that extends (j). 
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9. Complexity result for #Binary 
Here is our main result on binary trees. 

Theorem 9.1. #Binary is #P-complete. 


Proof. A proof similar to that of Lemma [HT3] shows that ^j^Binary is in ^P. To prove ^j^Binary is in ^P-hard, 
we provide a polynomial reduction from ^D3SAT. 

Fix a D3CNF, F = Aig[fe] with k clauses and n variables. Let 


:= /\ c. A /\ $. 




IG h 


For each i 
join 


, Ck+in to the An clauses of 


The final binary tree B will 


with 2n variables, {vi,V 2 ,.. .Vn,wi,W 2 , ■ ■ ■, Wn}, where each clause Ci has three distinct literals from {vi,vi : 
i G [n]}, and is the D3CNF in (l24l) which guarantees that, for each i G [n], Vi and Wi have different 
truth values in a satisfying assignment. By Lemma 18.31 there is a bijection between the satisfying truth 
assignments for F and the satisfying truth assignments for t['(F). We will construct a binary tree B and 
define a labeling (p of its leaves with binary strings in B such that the number of satisfying truth assignment 
for il'(r) is directly computable from ZT^ipiB, a:!), the number of most parsimonious scenarios for binary tree 
T with leaf labeling (p. 

Each has 4 clauses, so 'l'(r) has k + An clauses. Assign the names Ck+i, 

AzgH 

'i'(r) = A 

i€[fc+4n] 

G [k + 4n], we define a binary tree Bi which encodes clause ct 

., Bk+in by a comb. For t = 148(16n^ + 8kn){k + An), the leaf-labeling ip : L{B) —5> {0,1}^"+*, 
will assign a binary string with coordinates (xi, i/i,..., yn, ei,..., et) to each leaf. The Xi coordinates 
will correspond to the Vi variables and the yi coordinates will correspond to the Wi variables of 'I'(F). The 
et coordinates will be for additional ones, used in a manner similar to the additional ones in the previous 
sections for star trees. 

In this section and the next, we denote the left child of a non-leaf vertex v by and the right child by 
Vr- The height of a vertex is its graph distance from the root. The construction of Bi with its leaf labeling 
ip will come in Dehnition 19.51 but first we need some preliminary definitions. 

For any clause c = Va V vp V Vj which is the disjunction of 3 distinct literals, Miklos, Kiss, and Tannier 
(1201411 defined a unit subtree, U, with 248 leaves. They also defined a leaf-labeling ip : L{U) -G {0,1}^^^ 
where the binary strings in the range have coordinates {xa, xp, x^, ei, 62 ,..., ei 48 ). The first three coordinates 
correspond to the variables in Ci and the remaining 148 coordinates are for additional ones. This unit subtree 
has some useful properties which will be discussed after Definition 19.51 

For each i G [k + An], let I4i be the unit subtree for clause Ci. li i < k where Ci relates Va,Vi 3 ,v.y, then 
Hi will have leaf labels with coordinates {xa,Xi 3 ,x.y} and 148 coordinates for additional ones. If * > fc 
where Ci relates variables Va, Wa,Va+i, then Ui will have leaf labels with coordinates {xa,ya, Xa+i} 148 
coordinates for additional ones. 

Definition 9.2. The tree 71 in Step 5 of Definition \9.4\ is a comb joining 16n^ -|- 8kn copies of lAi, as in 
Figure [7J 

Definition 9.3. For three literals a,b,c, we define S{a,b,c) to be the complete binary tree of height 3 with 
root p with the vertices labeled with equations as follows: 

(1) Assign the label “a = 0” to vertex and “a = 1” to pr- 

(2) For each vertex u of height 1, assign the label “b = 0” to ui and “b= 1” to Ur- 

(3) For each vertex v of height 2, assign the label “c = 0” to vi and “c = 1” to Vr- 

This tree is pictured on the right in Figure\^ We will use the representation on the left in place of S{a,b,c) 

in future figures. 

Next we construct Bi which will have the same tree structure as the desired Bi. However, Bi will have all 
of its vertices labeled with equations while Bi will only have leaf labels which are binary strings. The leaf 
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Figure 1. Comb connecting 16n? + 8kn copies of I4i to create Ti 



Figure 2. The labeled binary tree on the right is S{a, b, c). The representation on the left 
will be used in place of S{a, b, c) in future figures. 

labeling of Bi will be induced by the vertex labels of Bi. Each leaf will essentially inherit the labels of its 
ancestors. 

Definition 9.4. Fix i € [k + 4n]. Construct Bi, a binary tree with vertex labels, as follows. 

A. IfiG[k], then say clause Ci has variables Va,Vi 3 ,v~^. The construction of Bi described below is drawn 
in Figure\^ 

(a) Draw a vertex p* with two children, p\ and p],. 

(b) Label vertex p\ with the equations “Xj = yj =0” for each j G [n] \ {a,/3,7}. Label pi,, with 
“Xj =yj = I ” for all j G [n] \ {a, jd, 7}. 

(c) From each of p\ and p\., hang a copy of S{ya,y/ 3 ,y.y). 

(d) From each leaf of each copy of S(]ja,yp,y-f), hang a copy of S(xa,Xj 3 ,x.f). 

(e) Delete the left-most copy of S(xa,xi 3 ,x-f), the one which hangs below the vertices with labels 
“Ua = 0,” “yp = 0,” “y-f = 0,” and with ancestor p\, and replace it with a copy of the comb Ti 
from Definition \9.‘A 

B. If i G {fc + 1,..., fc + 4n}, then clause Ci relates variables Va,Wa,Va+i. The construction of Bi 
described below requires only a change of variables from the previous construction. 

(a) Draw a vertex p® with two ehildren, p\ and pj,. 

(b) Label p\ with the system of equations “xj = yj =0” for all j G [n] \ {a, a + 1, a + 2}. Label pi, 
with equations “xj = yj = 1 ” for eaeh j G [n] \ {a, a + 1, a + 2}. 

(e) From each of p\ and pi., hang a copy of S{ya+l,Xa+ 2 ^ya+ 2 )■ 

(d) Hang a copy of S{xa,ya,Xc,+i) from each leaf of each and every copy of S{ya+l,Xa-^- 2 ,ya-^- 2 )■ 

(e) Delete the left-most copy of S{xa,ya,Xa+i) and replace it with a copy of comb T from Defini- 
tion \9.‘A 

Recall S is a comb connecting binary trees Bi. The binary tree B has a leaf labeling p : L{B) -G {0, 
where 

t = 148(16n^ + 8kn)(k + 4n). 

Each leaf label will have coordinates {xi, yi,... ,Xn,yn,^i, ■ ■ ■ ,et)- In the next definition, we define Bi and 
values of p on the leaves of Bi. 

Definition 9.5. For each i G [k -\- An], the binary tree Bi will have the same tree structure as Bi. We only 
need to explain the labeling p : L{Bi) —>■ {0,1}^"+*. 

Partition [t] into classes Eij with \Eij\ = 148 for each i G [k -\- An] and each j G [16n^ + 8kn]. Identify 
the set Eij with the copy ofUi in %. Here we define ip(i) for each leaf £ G L{Bi). 
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Figure 3. The binary tree 6^, for i e [/c] created for clause Cj = ici V 0:2 V cca. 


There are two cases: 

• If leaf! is not in subtree %, then ip{£)[es] = 0 for all s € [t]. The value of each and ip{£)[y^] 

for w S [n] is inherited from the labels of the ancestors of i as they appeared in Bi. 

• If leaf £ is in the subtree % within Bi, then it is a leaf within the copy of unit subtree Ut for some 

j € [16n^ + 8kn\. Recall f>{£) G {0 ,Identify the coordinates {ei, 62 ,, ei 48 } with the indices 
in the I 48 coordinates of Eij in any order. If Cg corresponds to coordinate Cr for r G Eij, then we 
require ^p{£)[er\ = (p{£)[es]. If z is one of the three coordinates which correspond to a variable in Ci, 
we also require that q}{£)[z] = £p[£)[z\. Set ip{£i)[es\ = 0 for s ^ Eij. All other coordinates of ip{£) will 

take the value 0 (the value inherited from the labeling of their ancestors in Bi). 


Define (pxj ■ L{B) {0,1} so that Define (py^ and pej similarly. We want to examine 

the Fitch solutions on B for each p^^, Py^, and pe^- We will first prove that the conditions of Lemma [8.41 
hold and thus Fitch’s algorithm find all most parsimonious labelings for p on B. 

We first explore the Fitch solutions for pe^, j G [t]. 

Fact 9.6. Fix) G [t]. There is only one £ G L{B) with pg.{t) = 1. After running Part 1 of Fitch’s algorithm, 
B{£) = {!}, the parent v of £ has B{v) = {0,1}, and Biu) = {0} for all other vertices. Consequently, Part 
2 of Fitch’s algorithm will output a most parsimonious labeling p'^. such that p'^.lF) = 1 and for all other 
vertices u G V(B), p'^. (u) = 0. 

Proof. These values of B follow directly from the description of p{£)\ej], for leaf £, which was given in 
Definition 19.51 The conclusion follows from the definition of p' (Ell). □ 


Fact 9.7. For j G [t], there is only one most parsimonious labeling p'^. which extends leaf labeling Pf,. ofB. 

Proof. Recall that most parsimonious labelings minimize the sum of Hamming distances between adjacent 
vertices in the tree. The most parsimonious labeling obtained from Fitch’s algorithm has 

= 1 - 

uvGE{B) 

Because there is only one leaf £ with p^. {£) = !, the p'^. obtained from Fitch’s algorithm is the only extension 
of pej with the sum of Hamming distances equal 1. □ 


Fix j G [n]. Next we consider the most parsimonious labelings for px,^ on B. The same arguments will 
hold for each py.. 

Run Part 1 of Fitch’s algorithm on B with leaf labeling px^. For those clauses Ci which contain variable 
Vj, we have the following result. 


Proposition 9.8 (Miklos, Kiss, and Tannier 120141 1. Fix a clause Ci. Suppose variable Vj is in Ci with 
coordinate Xj corresponding to variable Vj . Let r® be the root of unit subtree lAi for Ci . Run Fitch’s algorithm 
on lAi with leaf labeling pxj. The following hold: 

(1) B{F) = {0,1}. 

(2) For u,v € V {Ui), if v is a child of u, then B{v) = {0,1} Biu) = {0,1}. 
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In a single copy of S{a, b, c), all vertices of the same distance from the root either have B{v) G {{0}, {!}} or 
all of them have B{v) = {0,1}. This fact together with Proposition [9]8] implies that, when Fitch’s algorithm 
is run on B with leaf-labeling ip{xi), for any rt, u G yiT) with v a child of u, 

B{v) = {G,l}^B{u) = {G,l}. 

With this result and the structure of each S{a^ b, c), by Lemma [8.41 we can conclude that Fitch’s algorithm 
finds all most parsimonious labelings of B that extend Further, B{p) = {0,1} implies there are exactly 
two such most parsimonious labelings. 

As mentioned earlier, these results also hold for coordinate yi. Fitch’s algorithm finds the only two most 
parsimonious labelings that extend py on 6. 

For most parsimonious labeling p' that extends tp, on each v G V(T), notate ip'{v)[xj] by (p'^.. Likewise, 
define the notations and . 

Lemma 9.9. For leaf labeling p of B, Fitch’s algorithm finds all most parsimonious labelings. Each is 

characterized by the string it assigns to the root p of B and there are precisely 2^’^ most parsimonious 

labelings, one for each root label in {0, x {0}*. 

Proof. Given a most parsimonious labeling p' that extends p, each , Py ^, and p'^^ is a most parsimonious 

labeling for that coordinate. So it suffices to first find all most parsimonious scenarios for the leaf labelings 
Fxj ) Pyj ^ Pss for ah j G [n] and s G [t] and take combinations of these labelings. 

We have already seen that Fitch’s algorithm will find all most parsimonious labelings for px and py ., and 
there are exactly 2 of each. Fitch’s algorithm will also find the one and only most parsimonious labeling for 
Pe,. Therefore, there are most parsimonious labelings of B that extend p. Part 2 of Fitch’s algorithm 
shows that each most parsimonious labeling is characterized by the string it assigns to p. Since B[p) = {0,1} 
for each p^^ and py^ and B[p) = {0} for each pe^, the possible strings for p'{p) are {0,1}^" x {0}*. □ 

Set M := {0,1}^'* x {0}*. 

Definition 9.10. There is a bijection between M and the possible truth assignments for 'l'(r). In particular, 
given any p G M., define a truth assignment for variables U as follows: 

• For each i G [n], let Vi he assigned the value true if p[xi] = 1 and false otherwise. 

• For each i G [n], let Wi be assigned the value true if p[yi] = 1 and false otherwise. 

Define Alii((r) to be the set of p G M. which correspond to satisfying truth assignments for h'(r). Likewise, 
for any Q, a clause or conjunction of clauses from 'l'(r), define M.q to be the set of p G A4 which correspond 
to satisfying truth assignments for 0. 

Now we know that each most parsimonious labeling of B extending p is found using Fitch’s algorithm and 
is characterized by the binary string it assigns to the root. From here, we are interested in the number of 
scenarios admitted by each of these most parsimonious labelings. Ultimately, we wish to make a distinction 
between the binary strings in Alif(r) and those in Al\Al$(r) by examining the number of scenarios admitted 
by the corresponding most parsimonious labeling. 

Let p' be a most parsimonious labeling for B. The number of scenarios which are admitted by p' is 
precisely 

^(<P'(P)) := n H{p'{u),p'{v)y.. 

uvGE{B) 

To calculate this, we partition the edges of B into 4 sets. 

First, consider the edges of the comb which connects to form B. Part 2 of Fitch’s algorithm will 

set p'{p) = p'{p^) where p* is the root of Bi. So the Hamming distance along each of these edges is 0. 

Next we look within each Bi. 

Claim 9.11. Set $ := /\$t as defined in (1251) . Fori G [A:-|-4n], let p® be the root of Bi with children p\ and 
p®. Set rj := p'{p^). If p G Af'^, then 

H{p,p'{p\)) = H{p,p'{pl)) =n-3. 

Otherwise 

(n-3)!^ < H{p,p'{p})y. ■ H{p,p'{p\.)y. < (2n-6)!0!. 
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Proof. Suppose rj £ Then for each j G [n] considered in Step 2 of Definition 19.41 fthere are n — 3 such 
j) , if ri[xj\ = 0 then we have the following properties: 

• ’71%] = 1 because rj corresponds to a satisfying assignment for $, 

, 0 = r][xj\ + ip'{p\)[xj] = 1 , 

• 1 = vlVj] ^ p'ip'dbj] = 0 - 

On the other hand, if ri[xj\ = 1 then we have the following properties: 

• because rj corresponds to a satisfying assignment for $, 

• 1 = = 0 , 

• 0 = 'n[yj\ p'{pV)[Vj\ = 1- 

For each s G [t], r][es\ = p'{p\)[es] = <p'{p\.)[es\ = 0. For each j G [n] which was not considered in Step 2 of 
Definition 19.41 p[xj] = p'{p\)[xj] = p'{p\.)[xj] and ri[yj] = ’p'[p\)[yj\ = because the B values (from 

Fitch’s algorithm) for these coordinates at these vertices will be {0,1}. Thus 

H{v,(p'{Pi)) = H{p,ip'{pl)) =n-3. 

Alternatively, if ry ^ Ad$, then H{'q,ip'{p\)) + {p\.)) = 2n — 6 because each Xi and each yi will 

contribute 1 to one of the Hamming distances. Using the convexity of the factorial, this establishes the last 
line of the claim. □ 


Based on the construction of S[a,b,c), the Hamming distance H{ip'{u), ip'{v)) for each edge uv in each 
copy of S'(a, 5, c) is exactly 1. 

The only piece remaining is 7). We make the following remarks for the clause Ci = V U 2 V V 3 to make 
the explanation easier. However, the arguments can be extended for any clause Ci in 4i(r). 

Fact 9.12. If P is the root of 71 and r® is the root of one of the copies oflAi below 71, then running Fitch’s 
algorithm for each coordinate, we find 

• B{P) = B(r’-) = {0,1} for each Xi, i G [3], by Provosition \9.8\. 

• B{P) = B{r’') = {0} for each Xi, 7 > 4, by the construction of Bi. 

• B{P) = B{r’') = {0} for each yi, i G [n], by the construction of Bi. 

• B{P) = B{r’') = {0} for each Cg, s G [t], because there is only one leaf I G L{B) with p{()[es\ = 1. 

Therefore, it is easy to see that, along the edges of the comb which connect the copies of Ui, the Hamming 
distances will be 0. 

Next we turn our attention to a single copy of Ui, say the copy. 

Fact 9.13. Fix F and build binary tree B. Fix a most parsimonious labeling p' which extends leaf labeling 

p. For clause Ci = vi V V 2 \/ V 3 , we have the following characteristics for each v £Ui, 

• for s> 4, (p'(u)[a;s] = 0, 

• for s £ [n], p'{v)[ys] = 0, 

• for s ^ Eij, p'{v)[es\ = 0. 


Therefore, only the values of p'{v) on the coordinates xi,X 2 ,X 3 and Cg for s £ Eij will affect the Hamming 
distances along the edges in Ui. These are precisely the 15 1 coor dinates that appeared in the original labeling 
p of the leaves of Ui given by Miklos, Kiss, and Tannier (12014^ . For each v £ ViUi), define p'{v) : ViUi) -£■ 
{0,1}^®^ to be the restriction of p'{v) to these 151 coordinates. In particular, p' is a most parsimonious 
labeling on Ui which extends leaf labeling p. 

The following fact is a consequence of Fact 19.131 

Fact 9.14. Let P be the root ofUi. If p'ir’’) = p'(P), then for each uv £ EifJi), 

H{p'{u),p'{v)) = H{p'{u),p'{v)). 


.A.S ct r0su.lt 

uv^lAi uv^lAi 

This is calculated as follows: 

Fact 9.15 (Miklos, Kiss, and Tannier boilh . Fix i £ [A: + 4n], the binary tree Ui with root P, and leaf-labeling 
p. Then for any most parsimonious labeling p' which extends p: 
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(1) corresponds to a satisfying truth assignment for Ci, then 

UV^hii 

(2) If f>'{r'^) corresponds to a truth assignment which does not satisfy Ci, then 

uv^lAi 


Since ‘p'{p) = ’p'{r^) = ‘f'{p) corresponds to a satisfying truth assignment for Ci if and only if (^'(r*) 

also corresponds to a satisfying truth assignment for c^. 

As a result of the above discussion, we have proven the following claim. 

Claim 9.16. Fix i € [k + In]. If (p'{p) corresponds to a satisfying truth assignment for clause Ct and 
A.gN then 

n H{g,'{u),g,'{v))\ = (n - S)!^ (21^6 x , 

uvGE{Bi ) 

If(p'{p) corresponds to a truth assignment which does not satisfy Ci, then 

uv^E{3i) 

< (2n-6)! (2136x3^6)'®"'+®"". 

If (p'{p) corresponds to a truth assignment which satisfies Ci but does not satisfy AigH then 
{n - 3)!^ (2156 ^ 364)i6n^+8fen ^ ^ p'{v))\ 

uv^E{3i) 

< (2n-6)! (2156x364)'®"'+®'=". 


Observe, 


(2n - 6)! [2136 X 3^6] 16n^+8fcn 3 16«=+8fcn _ g 

_ 3)12 pise X 364jl6n=^+8fcn 2^0 ^71-3 

q12 4 16n^+8fcn 


< 


< 


< 


220^ 

q 12 4 16n^+8fcn 


220 


22n+fc 
T 16n^+8fcn 


220 -l/( 8 n) 


>12 


219.5 


16n +8fcn 


< 1 . 


Consequently, 

(2n - 6)! (2136 X 376)16"^+8fen ^ _ 3^2 (2156 X 364)16n^+8fen 

< (2n-6)! (2156 X 364)1®”'+®'=”. 

Claim 9.17. If Lp'(p) corresponds to a satisfying truth assignment for 'l'(r), then 


niip'ip)) = [(n-3)!2 (2136 X 376)16n^+3fc" 


1 fe+4n 


=: B, 


good’ 
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If ip'{p) corresponds to a truth assignment which does not satisfy 'l'(r), then there must be a clause Ci for 
some i G [k + 4n] which is not satisfied. Therefore, 


n{ip'{p)) <{2n - 6)! (2^36 X 
■ '(271-6)! (21^6 
— '•Bbad' 


fc+ 4 n —1 


Define 

^total ■ — E «(*>'(;.)) 
v' 

which is the total number of most parsimonious scenarios for B which extend leaf labeling p, as in Defini¬ 
tion [03 

Given only Btotai, we would like to determine the number of satisfying truth assignments, [S’!, for 'l'(r). 

Btotal — E H{p) + Y. 

= \S\Bgood + 'Y ^(V)- 
J7'eA4\A41[,(r) 


As long as 'Yhri'^ m\M’ ^ j < Bgood, we can conclude that the number of satisfying truth assignments 

for 'l'(r) (and for F) is precisely 

Btotal 


Observe, for n > 2, 


_ Bgood 


oJn TD 

^ J^bad _ 22n 


B. 


good 


< 2 ^ 


}12 


220 


16 n^+ 8 A:n 


-2I21 16 n^+ 8 A;n 


220 

^ 2^8n^-\-2kn-\-2n 

■3 


2n — 6 
n — 3 

22n(fc+4n) 


k-\-4n 


■^12 "I 16n'^+8fcn 


^ -\-4kn 

3 


220 

12 1 16 n^-t- 8 /cn 


220 

12 0 16 n^+ 8 fcn 


220 - 1/2 


< 1 . 


Because there are only 2^" truth assignments and 2^" most parsimonious labelings, we obtain our desired 
result: 

E ^ E — ‘^^^^bad < Bgood- 

ri'GU ri'eU 

Therefore, if we could determine the total number of most parsimonious scenarios for this binary tree 
in polynomial time, then we could obtain the total number of satisfying assignments for 'l'(r) and for F in 
polynomial time. This completes the proof. □ 


10. Conclusion 

We proved that it is #P-complete to calculate the partition function Z{B,x\). However, the existence 
of an FPAUS for this quantity has not yet been established. Following a number of results relating to 
calculating Z{B,f{x)) exactly for various functions /(x), we where able to prove that, when log/(x) is 
strictly decreasing, under mild conditions an FPAUS exists for Z{B,f{x)) only if RP=NP. The question 
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of approximating Z{B,f{x)) when log/(ai) is strictly increasing remains unsettled. We concluded with a 
^P-complete result for the extension of the partition function to binary trees, a natural extension to the 
bioinformatics interpretation of Z{B,x\). 
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